• ### On the efficiency of the de-biased Lasso(1708.07986)

Aug. 21, 2018 math.ST, stat.TH
We consider the high-dimensional linear regression model $Y = X \beta^0 + \epsilon$ with Gaussian noise $\epsilon$ and Gaussian random design $X$. We assume that $\Sigma:= E X^T X / n$ is non-singular and write its inverse as $\Theta := \Sigma^{-1}$. The parameter of interest is the first component $\beta_1^0$ of $\beta^0$. We show that in the high-dimensional case the asymptotic variance of a debiased Lasso estimator can be smaller than $\Theta_{1,1}$. For some special such cases we establish asymptotic efficiency. The conditions include $\beta^0$ being sparse and the first column $\Theta_1$ of $\Theta$ being not sparse. These conditions depend on whether $\Sigma$ is known or not.
• ### On tight bounds for the Lasso(1804.00989)

April 3, 2018 math.ST, stat.TH
We present upper and lower bounds for the prediction error of the Lasso. For the case of random Gaussian design, we show that under mild conditions the prediction error of the Lasso is up to smaller order terms dominated by the prediction error of its noiseless counterpart. We then provide exact expressions for the prediction error of the latter, in terms of compatibility constants. Here, we assume the active components of the underlying regression function satisfy some "betamin" condition. For the case of fixed design, we provide upper and lower bounds, again in terms of compatibility constants. As an example, we give an up to a logarithmic term tight bound for the least squares estimator with total variation penalty.
• ### Sharp oracle inequalities for stationary points of nonconvex penalized M-estimators(1802.09733)

Feb. 27, 2018 math.ST, stat.TH
Many statistical estimation procedures lead to nonconvex optimization problems. Algorithms to solve these are often guaranteed to output a stationary point of the optimization problem. Oracle inequalities are an important theoretical instrument to asses the statistical performance of an estimator. Oracle results have focused on the theoretical properties of the uncomputable (global) minimum or maximum. In the present work a general framework used for convex optimization problems to derive oracle inequalities for stationary points is extended. A main new ingredient of these oracle inequalities is that they are sharp: they show closeness to the best approximation within the model plus a remainder term. We apply this framework to different estimation problems.
• ### De-biased sparse PCA: Inference and testing for eigenstructure of large covariance matrices(1801.10567)

Jan. 31, 2018 math.ST, stat.TH, stat.ME
Sparse principal component analysis (sPCA) has become one of the most widely used techniques for dimensionality reduction in high-dimensional datasets. The main challenge underlying sPCA is to estimate the first vector of loadings of the population covariance matrix, provided that only a certain number of loadings are non-zero. In this paper, we propose confidence intervals for individual loadings and for the largest eigenvalue of the population covariance matrix. Given an independent sample $X^i \in\mathbb R^p, i = 1,...,n,$ generated from an unknown distribution with an unknown covariance matrix $\Sigma_0$, our aim is to estimate the first vector of loadings and the largest eigenvalue of $\Sigma_0$ in a setting where $p\gg n$. Next to the high-dimensionality, another challenge lies in the inherent non-convexity of the problem. We base our methodology on a Lasso-penalized M-estimator which, despite non-convexity, may be solved by a polynomial-time algorithm such as coordinate or gradient descent. We show that our estimator achieves the minimax optimal rates in $\ell_1$ and $\ell_2$-norm. We identify the bias in the Lasso-based estimator and propose a de-biased sparse PCA estimator for the vector of loadings and for the largest eigenvalue of the covariance matrix $\Sigma_0$. Our main results provide theoretical guarantees for asymptotic normality of the de-biased estimator. The major conditions we impose are sparsity in the first eigenvector of small order $\sqrt{n}/\log p$ and sparsity of the same order in the columns of the inverse Hessian matrix of the population risk.
• ### Inference in high-dimensional graphical models(1801.08512)

Jan. 25, 2018 math.ST, stat.TH, stat.ME
We provide a selected overview of methodology and theory for estimation and inference on the edge weights in high-dimensional directed and undirected Gaussian graphical models. For undirected graphical models, two main explicit constructions are provided: one based on a global method that maximizes the joint likelihood (the graphical Lasso) and one based on a local (nodewise) method that sequentially applies the Lasso to estimate the neighbourhood of each node. The proposed estimators lead to confidence intervals for edge weights and recovery of the edge structure. We evaluate their empirical performance in an extensive simulation study. The theoretical guarantees for the methods are achieved under a sparsity condition relative to the sample size and regularity conditions. For directed acyclic graphs, we apply similar ideas to construct confidence intervals for edge weights, when the directed acyclic graph is identifiable.
• ### Semi-parametric efficiency bounds for high-dimensional models(1601.00815)

Oct. 12, 2017 math.ST, stat.TH, stat.ME
Asymptotic lower bounds for estimation play a fundamental role in assessing the quality of statistical procedures. In this paper we propose a framework for obtaining semi-parametric efficiency bounds for sparse high-dimensional models, where the dimension of the parameter is larger than the sample size. We adopt a semi-parametric point of view: we concentrate on one dimensional functions of a high-dimensional parameter. We follow two different approaches to reach the lower bounds: asymptotic Cram\'er-Rao bounds and Le Cam's type of analysis. Both these approaches allow us to define a class of asymptotically unbiased or "regular" estimators for which a lower bound is derived. Consequently, we show that certain estimators obtained by de-sparsifying (or de-biasing) an $\ell_1$-penalized M-estimator are asymptotically unbiased and achieve the lower bound on the variance: thus in this sense they are asymptotically efficient. The paper discusses in detail the linear regression model and the Gaussian graphical model.
• ### Robust Low-Rank Matrix Estimation(1603.09071)

July 24, 2017 math.ST, stat.TH
Many results have been proved for various nuclear norm penalized estimators of the uniform sampling matrix completion problem. However, most of these estimators are not robust: in most of the cases the quadratic loss function and its modifications are used. We consider robust nuclear norm penalized estimators using two well-known robust loss functions: the absolute value loss and the Huber loss. Under several conditions on the sparsity of the problem (i.e. the rank of the parameter matrix) and on the regularity of the risk function sharp and non-sharp oracle inequalities for these estimators are shown to hold with high probability. As a consequence, the asymptotic behavior of the estimators is derived. Similar error bounds are obtained under the assumption of weak sparsity, i.e. the case where the matrix is assumed to be only approximately low-rank. In all our results we consider a high-dimensional setting. In this case, this means that we assume $n\leq pq$. Finally, various simulations confirm our theoretical results.
• ### Asymptotic Confidence Regions for High-dimensional Structured Sparsity(1706.09231)

June 28, 2017 math.ST, stat.TH, stat.ML
In the setting of high-dimensional linear regression models, we propose two frameworks for constructing pointwise and group confidence sets for penalized estimators which incorporate prior knowledge about the organization of the non-zero coefficients. This is done by desparsifying the estimator as in van de Geer et al. [18] and van de Geer and Stucky [17], then using an appropriate estimator for the precision matrix $\Theta$. In order to estimate the precision matrix a corresponding structured matrix norm penalty has to be introduced. After normalization the result is an asymptotic pivot. The asymptotic behavior is studied and simulations are added to study the differences between the two schemes.
• ### Some exercises with the Lasso and its compatibility constant(1701.03326)

Jan. 12, 2017 math.ST, stat.TH
We consider the Lasso for a noiseless experiment where one has observations $X \beta^0$ and uses the penalized version of basis pursuit. We compute for some special designs the compatibility constant, a quantity closely related to the restricted eigenvalue. We moreover show the dependence of the (penalized) prediction error on this compatibility constant. This exercise illustrates that compatibility is necessarily entering into the bounds for the (penalized) prediction error and that the bounds in the literature therefore are - up to constants - tight. We also give conditions that show that in the noisy case the dominating term for the prediction error is given by the prediction error of the noiseless case.
• ### Concentration behavior of the penalized least squares estimator(1511.08698)

Oct. 19, 2016 math.ST, stat.TH
Consider the standard nonparametric regression model and take as estimator the penalized least squares function. In this article, we study the trade-off between closeness to the true function and complexity penalization of the estimator, where complexity is described by a seminorm on a class of functions. First, we present an exponential concentration inequality revealing the concentration behavior of the trade-off of the penalized least squares estimator around a nonrandom quantity, where such quantity depends on the problem under consideration. Then, under some conditions and for the proper choice of the tuning parameter, we obtain bounds for this nonrandom quantity. We illustrate our results with some examples that include the smoothing splines estimator.
• ### Confidence regions for high-dimensional generalized linear models under sparsity(1610.01353)

Oct. 5, 2016 math.ST, stat.TH, stat.ME
We study asymptotically normal estimation and confidence regions for low-dimensional parameters in high-dimensional sparse models. Our approach is based on the $\ell_1$-penalized M-estimator which is used for construction of a bias corrected estimator. We show that the proposed estimator is asymptotically normal, under a sparsity assumption on the high-dimensional parameter, smoothness conditions on the expected loss and an entropy condition. This leads to uniformly valid confidence regions and hypothesis testing for low-dimensional parameters. The present approach is different in that it allows for treatment of loss functions that we not sufficiently differentiable, such as quantile loss, Huber loss or hinge loss functions. We also provide new results for estimation of the inverse Fisher information matrix, which is necessary for the construction of the proposed estimator. We formulate our results for general models under high-level conditions, but investigate these conditions in detail for generalized linear models and provide mild sufficient conditions. As particular examples, we investigate the case of quantile loss and Huber loss in linear regression and demonstrate the performance of the estimators in a simulation study and on real datasets from genome-wide association studies. We further investigate the case of logistic regression and illustrate the performance of the estimator on simulated and real data.
• ### Honest confidence regions and optimality in high-dimensional precision matrix estimation(1507.02061)

July 20, 2016 math.ST, stat.TH
We propose methodology for estimation of sparse precision matrices and statistical inference for their low-dimensional parameters in a high-dimensional setting where the number of parameters $p$ can be much larger than the sample size. We show that the novel estimator achieves minimax rates in supremum norm and the low-dimensional components of the estimator have a Gaussian limiting distribution. These results hold uniformly over the class of precision matrices with row sparsity of small order $\sqrt{n}/\log p$ and spectrum uniformly bounded, under a sub-Gaussian tail assumption on the margins of the true underlying distribution. Consequently, our results lead to uniformly valid confidence regions for low-dimensional parameters of the precision matrix. Thresholding the estimator leads to variable selection without imposing irrepresentability conditions. The performance of the method is demonstrated in a simulation study and on real data.
• ### Sharp Oracle Inequalities for Square Root Regularization(1509.04093)

June 27, 2016 math.ST, stat.TH
We study a set of regularization methods for high-dimensional linear regression models. These penalized estimators have the square root of the residual sum of squared errors as loss function, and any weakly decomposable norm as penalty function. This fit measure is chosen because of its property that the estimator does not depend on the unknown standard deviation of the noise. On the other hand, a generalized weakly decomposable norm penalty is very useful in being able to deal with different underlying sparsity structures. We can choose a different sparsity inducing norm depending on how we want to interpret the unknown parameter vector $\beta$. Structured sparsity norms, as defined in Micchelli et al. [18], are special cases of weakly decomposable norms, therefore we also include the square root LASSO (Belloni et al. [3]), the group square root LASSO (Bunea et al. [10]) and a new method called the square root SLOPE (in a similar fashion to the SLOPE from Bogdan et al. [6]). For this collection of estimators our results provide sharp oracle inequalities with the Karush-Kuhn-Tucker conditions. We discuss some examples of estimators. Based on a simulation we illustrate some advantages of the square root SLOPE.
• ### On concentration for (regularized) empirical risk minimization(1512.00677)

Jan. 11, 2016 math.ST, stat.TH
Rates of convergence for empirical risk minimizers have been well studied in the literature. In this paper, we aim to provide a complementary set of results, in particular by showing that after normalization, the risk of the empirical minimizer concentrates on a single point. Such results have been established by~\cite{chatterjee2014new} for constrained estimators in the normal sequence model. We first generalize and sharpen this result to regularized least squares with convex penalties, making use of a "direct" argument based on Borell's theorem. We then study generalizations to other loss functions, including the negative log-likelihood for exponential families combined with a strictly convex regularization penalty. The results in this general setting are based on more "indirect" arguments as well as on concentration inequalities for maxima of empirical processes.
• ### $\chi^2$-confidence sets in high-dimensional regression(1502.07131)

Sept. 15, 2015 math.ST, stat.TH
We study a high-dimensional regression model. Aim is to construct a confidence set for a given group of regression coefficients, treating all other regression coefficients as nuisance parameters. We apply a one-step procedure with the square-root Lasso as initial estimator and a multivariate square-root Lasso for constructing a surrogate Fisher information matrix. The multivariate square-root Lasso is based on nuclear norm loss with $\ell_1$-penalty. We show that this procedure leads to an asymptotically $\chi^2$-distributed pivot, with a remainder term depending only on the $\ell_1$-error of the initial estimator. We show that under $\ell_1$-sparsity conditions on the regression coefficients $\beta^0$ the square-root Lasso produces to a consistent estimator of the noise variance and we establish sharp oracle inequalities which show that the remainder term is small under further sparsity conditions on $\beta^0$ and compatibility conditions on the design.
• ### Confidence intervals for high-dimensional inverse covariance estimation(1403.6752)

Aug. 11, 2015 math.ST, stat.TH, stat.ME
We propose methodology for statistical inference for low-dimensional parameters of sparse precision matrices in a high-dimensional setting. Our method leads to a non-sparse estimator of the precision matrix whose entries have a Gaussian limiting distribution. Asymptotic properties of the novel estimator are analyzed for the case of sub-Gaussian observations under a sparsity assumption on the entries of the true precision matrix and regularity conditions. Thresholding the de-sparsified estimator gives guarantees for edge selection in the associated graphical model. Performance of the proposed method is illustrated in a simulation study.
• ### High-dimensional inference in misspecified linear models(1503.06426)

March 22, 2015 stat.ME
We consider high-dimensional inference when the assumed linear model is misspecified. We describe some correct interpretations and corresponding sufficient assumptions for valid asymptotic inference of the model parameters, which still have a useful meaning when the model is misspecified. We largely focus on the de-sparsified Lasso procedure but we also indicate some implications for (multiple) sample splitting techniques. In view of available methods and software, our results contribute to robustness considerations with respect to model misspecification.
• ### On higher order isotropy conditions and lower bounds for sparse quadratic forms(1405.5995)

Nov. 9, 2014 math.ST, stat.TH
This study aims at contributing to lower bounds for empirical compatibility constants or empirical restricted eigenvalues. This is of importance in compressed sensing and theory for $\ell_1$-regularized estimators. Let $X$ be an $n \times p$ data matrix with rows being independent copies of a $p$-dimensional random variable. Let $\hat \Sigma := X^T X / n$ be the inner product matrix. We show that the quadratic forms $u^T \hat \Sigma u$ are lower bounded by a value converging to one, uniformly over the set of vectors $u$ with $u^T \Sigma_0 u$ equal to one and $\ell_1$-norm at most $M$. Here $\Sigma_0 := {\bf E} \hat \Sigma$ is the theoretical inner product matrix which we assume to exist. The constant $M$ is required to be of small order $\sqrt {n / \log p}$. We assume moreover $m$-th order isotropy for some $m >2$ and sub-exponential tails or moments up to order $\log p$ for the entries in $X$. As a consequence we obtain convergence of the empirical compatibility constant to its theoretical counterpart, and similarly for the empirical restricted eigenvalue. If the data matrix $X$ is first normalized so that its columns all have equal length we obtain lower bounds assuming only isotropy and no further moment conditions on its entries. The isotropy condition is shown to hold for certain martingale situations.
• ### New concentration inequalities for suprema of empirical processes(1111.3486)

Oct. 22, 2014 math.PR, math.ST, stat.TH
While effective concentration inequalities for suprema of empirical processes exist under boundedness or strict tail assumptions, no comparable results have been available under considerably weaker assumptions. In this paper, we derive concentration inequalities assuming only low moments for an envelope of the empirical process. These concentration inequalities are beneficial even when the envelope is much larger than the single functions under consideration.
• ### Statistical Theory for High-Dimensional Models(1409.8557)

Sept. 30, 2014 math.ST, stat.TH
These lecture notes consist of three chapters. In the first chapter we present oracle inequalities for the prediction error of the Lasso and square-root Lasso and briefly describe the scaled Lasso. In the second chapter we establish asymptotic linearity of a de-sparsified Lasso. This implies asymptotic normality under certain conditions and therefore can be used to construct confidence intervals for parameters of interest. A similar line of reasoning can be invoked to derive bounds in sup-norm for the Lasso and asymptotic linearity of de-sparsified estimators of a precision matrix. In the third chapter we consider chaining and the more general generic chaining method developed by Talagrand. This allows one to bound suprema of random processes. Concentration inequalities are refined probability inequalities, mostly again for suprema of random processes. We combine the two. We prove a deviation inequality directly using (generic) chaining.
• ### On asymptotically optimal confidence regions and tests for high-dimensional models(1303.0518)

June 23, 2014 math.ST, stat.TH
We propose a general method for constructing confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in a high-dimensional model. It can be easily adjusted for multiplicity taking dependence among tests into account. For linear models, our method is essentially the same as in Zhang and Zhang [J. R. Stat. Soc. Ser. B Stat. Methodol. 76 (2014) 217-242]: we analyze its asymptotic properties and establish its asymptotic optimality in terms of semiparametric efficiency. Our method naturally extends to generalized linear models with convex loss functions. We develop the corresponding theory which includes a careful analysis for Gaussian, sub-Gaussian and bounded correlated designs.
• ### Discussion: "A significance test for the lasso"(1405.6792)

May 27, 2014 math.ST, stat.TH
Discussion of "A significance test for the lasso" by Richard Lockhart, Jonathan Taylor, Ryan J. Tibshirani, Robert Tibshirani [arXiv:1301.7161].
• ### The additive model with different smoothness for the components(1405.6584)

May 26, 2014 math.ST, stat.TH
We consider an additive regression model consisting of two components $f^0$ and $g^0$, where the first component $f^0$ is in some sense "smoother" than the second $g^0$. Smoothness is here described in terms of a semi-norm on the class of regression functions. We use a penalized least squares estimator $(\hat f, \hat g)$ of $(f^0, g^0)$ and show that the rate of convergence for $\hat f$ is faster than the rate of convergence for $\hat g$. In fact, both rates are generally as fast as in the case where one of the two components is known. The theory is illustrated by a simulation study. Our proofs rely on recent results from empirical process theory.
• ### Censored linear model in high dimensions(1405.0579)

May 3, 2014 math.ST, stat.TH
Censored data are quite common in statistics and have been studied in depth in the last years. In this paper we consider censored high-dimensional data. High-dimensional models are in some way more complex than their low-dimensional versions, therefore some different techniques are required. For the linear case appropriate estimators based on penalized regression, have been developed in the last years. In particular in sparse contexts the $l_1$-penalised regression (also known as LASSO) performs very well. Only few theoretical work was done in order to analyse censored linear models in a high-dimensional context. We therefore consider a high-dimensional censored linear model, where the response variable is left-censored. We propose a new estimator, which aims to work with high-dimensional linear censored data. Theoretical non-asymptotic oracle inequalities are derived.
• ### Worst possible sub-directions in high-dimensional models(1403.7023)

March 27, 2014 math.ST, stat.TH
We examine the rate of convergence of the Lasso estimator of lower dimensional components of the high-dimensional parameter. Under bounds on the $\ell_1$-norm on the worst possible sub-direction these rates are of order $\sqrt {|J| \log p / n }$ where $p$ is the total number of parameters, $J \subset \{ 1, \ldots, p \}$ represents a subset of the parameters and $n$ is the number of observations. We also derive rates in sup-norm in terms of the rate of convergence in $\ell_1$-norm. The irrepresentable condition on a set $J$ requires that the $\ell_1$-norm of the worst possible sub-direction is sufficiently smaller than one. In that case sharp oracle results can be obtained. Moreover, if the coefficients in $J$ are small enough the Lasso will put these coefficients to zero. This extends known results which say that the irrepresentable condition on the inactive set (the set where coefficients are exactly zero) implies no false positives. We further show that by de-sparsifying one obtains fast rates in supremum norm without conditions on the worst possible sub-direction. The main assumption here is that approximate sparsity is of order $o (\sqrt n / \log p )$. The results are extended to M-estimation with $\ell_1$-penalty for generalized linear models and exponential families for example. For the graphical Lasso this leads to an extension of known results to the case where the precision matrix is only approximately sparse. The bounds we provide are non-asymptotic but we also present asymptotic formulations for ease of interpretation.