• ### De-biased sparse PCA: Inference and testing for eigenstructure of large covariance matrices(1801.10567)

Jan. 31, 2018 math.ST, stat.TH, stat.ME
Sparse principal component analysis (sPCA) has become one of the most widely used techniques for dimensionality reduction in high-dimensional datasets. The main challenge underlying sPCA is to estimate the first vector of loadings of the population covariance matrix, provided that only a certain number of loadings are non-zero. In this paper, we propose confidence intervals for individual loadings and for the largest eigenvalue of the population covariance matrix. Given an independent sample $X^i \in\mathbb R^p, i = 1,...,n,$ generated from an unknown distribution with an unknown covariance matrix $\Sigma_0$, our aim is to estimate the first vector of loadings and the largest eigenvalue of $\Sigma_0$ in a setting where $p\gg n$. Next to the high-dimensionality, another challenge lies in the inherent non-convexity of the problem. We base our methodology on a Lasso-penalized M-estimator which, despite non-convexity, may be solved by a polynomial-time algorithm such as coordinate or gradient descent. We show that our estimator achieves the minimax optimal rates in $\ell_1$ and $\ell_2$-norm. We identify the bias in the Lasso-based estimator and propose a de-biased sparse PCA estimator for the vector of loadings and for the largest eigenvalue of the covariance matrix $\Sigma_0$. Our main results provide theoretical guarantees for asymptotic normality of the de-biased estimator. The major conditions we impose are sparsity in the first eigenvector of small order $\sqrt{n}/\log p$ and sparsity of the same order in the columns of the inverse Hessian matrix of the population risk.
• ### Inference in high-dimensional graphical models(1801.08512)

Jan. 25, 2018 math.ST, stat.TH, stat.ME
We provide a selected overview of methodology and theory for estimation and inference on the edge weights in high-dimensional directed and undirected Gaussian graphical models. For undirected graphical models, two main explicit constructions are provided: one based on a global method that maximizes the joint likelihood (the graphical Lasso) and one based on a local (nodewise) method that sequentially applies the Lasso to estimate the neighbourhood of each node. The proposed estimators lead to confidence intervals for edge weights and recovery of the edge structure. We evaluate their empirical performance in an extensive simulation study. The theoretical guarantees for the methods are achieved under a sparsity condition relative to the sample size and regularity conditions. For directed acyclic graphs, we apply similar ideas to construct confidence intervals for edge weights, when the directed acyclic graph is identifiable.
• ### Semi-parametric efficiency bounds for high-dimensional models(1601.00815)

Oct. 12, 2017 math.ST, stat.TH, stat.ME
Asymptotic lower bounds for estimation play a fundamental role in assessing the quality of statistical procedures. In this paper we propose a framework for obtaining semi-parametric efficiency bounds for sparse high-dimensional models, where the dimension of the parameter is larger than the sample size. We adopt a semi-parametric point of view: we concentrate on one dimensional functions of a high-dimensional parameter. We follow two different approaches to reach the lower bounds: asymptotic Cram\'er-Rao bounds and Le Cam's type of analysis. Both these approaches allow us to define a class of asymptotically unbiased or "regular" estimators for which a lower bound is derived. Consequently, we show that certain estimators obtained by de-sparsifying (or de-biasing) an $\ell_1$-penalized M-estimator are asymptotically unbiased and achieve the lower bound on the variance: thus in this sense they are asymptotically efficient. The paper discusses in detail the linear regression model and the Gaussian graphical model.
• ### Confidence regions for high-dimensional generalized linear models under sparsity(1610.01353)

Oct. 5, 2016 math.ST, stat.TH, stat.ME
We study asymptotically normal estimation and confidence regions for low-dimensional parameters in high-dimensional sparse models. Our approach is based on the $\ell_1$-penalized M-estimator which is used for construction of a bias corrected estimator. We show that the proposed estimator is asymptotically normal, under a sparsity assumption on the high-dimensional parameter, smoothness conditions on the expected loss and an entropy condition. This leads to uniformly valid confidence regions and hypothesis testing for low-dimensional parameters. The present approach is different in that it allows for treatment of loss functions that we not sufficiently differentiable, such as quantile loss, Huber loss or hinge loss functions. We also provide new results for estimation of the inverse Fisher information matrix, which is necessary for the construction of the proposed estimator. We formulate our results for general models under high-level conditions, but investigate these conditions in detail for generalized linear models and provide mild sufficient conditions. As particular examples, we investigate the case of quantile loss and Huber loss in linear regression and demonstrate the performance of the estimators in a simulation study and on real datasets from genome-wide association studies. We further investigate the case of logistic regression and illustrate the performance of the estimator on simulated and real data.
• ### Honest confidence regions and optimality in high-dimensional precision matrix estimation(1507.02061)

July 20, 2016 math.ST, stat.TH
We propose methodology for estimation of sparse precision matrices and statistical inference for their low-dimensional parameters in a high-dimensional setting where the number of parameters $p$ can be much larger than the sample size. We show that the novel estimator achieves minimax rates in supremum norm and the low-dimensional components of the estimator have a Gaussian limiting distribution. These results hold uniformly over the class of precision matrices with row sparsity of small order $\sqrt{n}/\log p$ and spectrum uniformly bounded, under a sub-Gaussian tail assumption on the margins of the true underlying distribution. Consequently, our results lead to uniformly valid confidence regions for low-dimensional parameters of the precision matrix. Thresholding the estimator leads to variable selection without imposing irrepresentability conditions. The performance of the method is demonstrated in a simulation study and on real data.
• ### Confidence intervals for high-dimensional inverse covariance estimation(1403.6752)

Aug. 11, 2015 math.ST, stat.TH, stat.ME
We propose methodology for statistical inference for low-dimensional parameters of sparse precision matrices in a high-dimensional setting. Our method leads to a non-sparse estimator of the precision matrix whose entries have a Gaussian limiting distribution. Asymptotic properties of the novel estimator are analyzed for the case of sub-Gaussian observations under a sparsity assumption on the entries of the true precision matrix and regularity conditions. Thresholding the de-sparsified estimator gives guarantees for edge selection in the associated graphical model. Performance of the proposed method is illustrated in a simulation study.