  主题🗻🪑:Optimal decorrelated score subsampling for generalized linear models with massive data

  时间🉐:2023年5月11号 10:00-11:30

  地点🫄🏽: 腾讯会议🏃🏻‍♂️‍➡️🤷‍♀️:748-682-688

  主持人:姜荣 教授


  王磊⚈🖐🏽,南开大学统计与数据科学学院副研究员,博士生导师。研究方向是复杂数据分析和统计学习,已在Biometrika、SCIENCE CHINA Mathematics✪🐚、Bernoulli、Statistica Sinica等统计学杂志发表学术论文50多篇,主持3项国家自然科学基金和1项天津市自然科学基金项目☹️。


  In this paper, we consider a unified optimal subsampling estimation and inference on lowdimensional parameter of main interest in the presence of nuisance parameter for low/high-dimensional generalized linear models (GLMs) with massive data.We first present a general subsampling decorrelated score function to reduce the influence of the less accurate nuisance parameter estimation with slow convergence rate. The consistency and asymptotic normality of the resultant subsample estimator from a general decorrelated score subsampling algorithm are established, and two optimal subsampling probabilities are derived under the A- and L-optimality criteria to downsize the data volume and reduce the computational burden. The proposed optimal subsampling probabilities provably improve the asymptotic efficiency upon the subsampling schemes in the lowdimensional GLMs and perform better than the uniform subsampling scheme in the high-dimensional GLMs. A two-step algorithm is further proposed to implement and the asymptotic properties of the corresponding estimators are also given. Simulations show satisfactory performance of the proposed estimators, and two applications to census income and Fashion-MNIST datasets also demonstrate its practical applicability.



