• ### Which bridge estimator is optimal for variable selection?(1705.08617)

March 12, 2019 cs.IT, math.IT, math.ST, stat.TH
We study the problem of variable selection for linear models under the high-dimensional asymptotic setting, where the number of observations $n$ grows at the same rate as the number of predictors $p$. We consider two-stage variable selection techniques (TVS) in which the first stage uses bridge estimators to obtain an estimate of the regression coefficients, and the second stage simply thresholds this estimate to select the "important" predictors. The asymptotic false discovery proportion (AFDP) and true positive proportion (ATPP) of these TVS are evaluated. We prove that for a fixed ATPP, in order to obtain a smaller AFDP, one should pick a bridge estimator with smaller asymptotic mean square error in the first stage of TVS. Based on such principled discovery, we present a sharp comparison of different TVS, via an in-depth investigation of the estimation properties of bridge estimators. Rather than "order-wise" error bounds with loose constants, our analysis focuses on precise error characterization. Various interesting signal-to-noise ratio and sparsity settings are studied. Our results offer new and thorough insights into high-dimensional variable selection. For instance, we prove that a TVS with Ridge in its first stage outperforms TVS with other bridge estimators in large noise settings; two-stage LASSO becomes inferior when the signal is rare and weak. As a by-product, we show that two-stage methods outperform some standard variable selection techniques, such as LASSO and Sure Independence Screening, under certain conditions.
• ### Low noise sensitivity analysis of Lq-minimization in oversampled systems(1705.03533)

Feb. 18, 2018 cs.IT, math.IT, math.ST, stat.TH
The class of Lq-regularized least squares (LQLS) are considered for estimating a p-dimensional vector \b{eta} from its n noisy linear observations y = X\b{eta}+w. The performance of these schemes are studied under the high-dimensional asymptotic setting in which p grows linearly with n. In this asymptotic setting, phase transition diagrams (PT) are often used for comparing the performance of different estimators. Although phase transition analysis is shown to provide useful information for compressed sensing, the fact that it ignores the measurement noise not only limits its applicability in many application areas, but also may lead to misunderstandings. For instance, consider a linear regression problem in which n > p and the signal is not exactly sparse. If the measurement noise is ignored in such systems, regularization techniques, such as LQLS, seem to be irrelevant since even the ordinary least squares (OLS) returns the exact solution. However, it is well-known that if n is not much larger than p then the regularization techniques improve the performance of OLS. In response to this limitation of PT analysis, we consider the low-noise sensitivity analysis. We show that this analysis framework (i) reveals the advantage of LQLS over OLS, (ii) captures the difference between different LQLS estimators even when n > p, and (iii) provides a fair comparison among different estimators in high signal-to-noise ratios. As an application of this framework, we will show that under mild conditions LASSO outperforms other LQLS even when the signal is dense. Finally, by a simple transformation we connect our low-noise sensitivity framework to the classical asymptotic regime in which n/p goes to infinity and characterize how and when regularization techniques offer improvements over ordinary least squares, and which regularizer gives the most improvement when the sample size is large.
• ### Matrix Completion with Nonconvex Regularization: Spectral Operators and Scalable Algorithms(1801.08227)

Jan. 24, 2018 stat.CO, stat.ML
In this paper, we study the popularly dubbed matrix completion problem, where the task is to "fill in" the unobserved entries of a matrix from a small subset of observed entries, under the assumption that the underlying matrix is of low-rank. Our contributions herein, enhance our prior work on nuclear norm regularized problems for matrix completion (Mazumder et al., 2010) by incorporating a continuum of nonconvex penalty functions between the convex nuclear norm and nonconvex rank functions. Inspired by SOFT-IMPUTE (Mazumder et al., 2010; Hastie et al., 2016), we propose NC-IMPUTE- an EM-flavored algorithmic framework for computing a family of nonconvex penalized matrix completion problems with warm-starts. We present a systematic study of the associated spectral thresholding operators, which play an important role in the overall algorithm. We study convergence properties of the algorithm. Using structured low-rank SVD computations, we demonstrate the computational scalability of our proposal for problems up to the Netflix size (approximately, a $500,000 \times 20, 000$ matrix with $10^8$ observed entries). We demonstrate that on a wide range of synthetic and real data instances, our proposed nonconvex regularization framework leads to low-rank solutions with better predictive performance when compared to those obtained from nuclear norm problems. Implementations of algorithms proposed herein, written in the R programming language, are made available on github.
• ### A note on estimation in a simple probit model under dependency(1712.09694)

We consider a probit model without covariates, but the latent Gaussian variables having compound symmetry covariance structure with a single parameter characterizing the common correlation. We study the parameter estimation problem under such one-parameter probit models. As a surprise, we demonstrate that the likelihood function does not yield consistent estimates for the correlation. We then formally prove the parameter's nonestimability by deriving a non-vanishing minimax lower bound. This counter-intuitive phenomenon provides an interesting insight that one bit information of the latent Gaussian variables is not sufficient to consistently recover their correlation. On the other hand, we further show that trinary data generated from the Gaussian variables can consistently estimate the correlation with parametric convergence rate. Hence we reveal a phase transition phenomenon regarding the discretization of latent Gaussian variables while preserving the estimability of the correlation.
• ### Overcoming The Limitations of Phase Transition by Higher Order Analysis of Regularization Techniques(1603.07377)

Oct. 20, 2017 cs.IT, math.IT, math.ST, stat.TH
We study the problem of estimating $\beta \in \mathbb{R}^p$ from its noisy linear observations $y= X\beta+ w$, where $w \sim N(0, \sigma_w^2 I_{n\times n})$, under the following high-dimensional asymptotic regime: given a fixed number $\delta$, $p \rightarrow \infty$, while $n/p \rightarrow \delta$. We consider the popular class of $\ell_q$-regularized least squares (LQLS) estimators, a.k.a. bridge, given by the optimization problem: \begin{equation*} \hat{\beta} (\lambda, q ) \in \arg\min_\beta \frac{1}{2} \|y-X\beta\|_2^2+ \lambda \|\beta\|_q^q, \end{equation*} and characterize the almost sure limit of $\frac{1}{p} \|\hat{\beta} (\lambda, q )- \beta\|_2^2$. The expression we derive for this limit does not have explicit forms and hence are not useful in comparing different algorithms, or providing information in evaluating the effect of $\delta$ or sparsity level of $\beta$. To simplify the expressions, researchers have considered the ideal "no-noise" regime and have characterized the values of $\delta$ for which the almost sure limit is zero. This is known as the phase transition analysis. In this paper, we first perform the phase transition analysis of LQLS. Our results reveal some of the limitations and misleading features of the phase transition analysis. To overcome these limitations, we propose the study of these algorithms under the low noise regime. Our new analysis framework not only sheds light on the results of the phase transition analysis, but also makes an accurate comparison of different regularizers possible.
• ### Regularization after retention in ultrahigh dimensional linear regression models(1311.5625)

Aug. 10, 2017 stat.ME
In ultrahigh dimensional setting, independence screening has been both theoretically and empirically proved a useful variable selection framework with low computation cost. In this work, we propose a two-step framework by using marginal information in a different perspective from independence screening. In particular, we retain significant variables rather than screening out irrelevant ones. The new method is shown to be model selection consistent in the ultrahigh dimensional linear regression model. To improve the finite sample performance, we then introduce a three-step version and characterize its asymptotic behavior. Simulations and real data analysis show advantages of our method over independence screening and its iterative variants in certain regimes.
• ### Community detection with nodal information(1610.09735)

Dec. 10, 2016 math.ST, stat.TH, stat.ME
Community detection is one of the fundamental problems in the study of network data. Most existing community detection approaches only consider edge information as inputs, and the output could be suboptimal when nodal information is available. In such cases, it is desirable to leverage nodal information for the improvement of community detection accuracy. Towards this goal, we propose a flexible network model incorporating nodal information, and develop likelihood-based inference methods. For the proposed methods, we establish favorable asymptotic properties as well as efficient algorithms for computation. Numerical experiments show the effectiveness of our methods in utilizing nodal information across a variety of simulated and real network data sets.
• ### Does $\ell_p$-minimization outperform $\ell_1$-minimization?(1501.03704)

June 10, 2016 cs.IT, math.IT, math.ST, stat.TH
In many application areas we are faced with the following question: Can we recover a sparse vector $x_o \in \mathbb{R}^N$ from its undersampled set of noisy observations $y \in \mathbb{R}^n$, $y=A x_o+w$. The last decade has witnessed a surge of algorithms and theoretical results addressing this question. One of the most popular algorithms is the $\ell_p$-regularized least squares (LPLS) given by the following formulation: $\hat{x}(\gamma,p )\in \arg\min_x \frac{1}{2}\|y - Ax\|_2^2+\gamma\|x\|_p^p,$ where $p \in [0,1]$. Despite the non-convexity of these problems for $p<1$, they are still appealing because of the following folklores in compressed sensing: (i) $\hat{x}(\gamma,p )$ is closer to $x_o$ than $\hat{x}(\gamma,1)$. (ii) If we employ iterative methods that aim to converge to a local minima of LPLS, then under good initialization these algorithms converge to a solution that is closer to $x_o$ than $\hat{x}(\gamma,1)$. In spite of the existence of plenty of empirical results that support these folklore theorems, the theoretical progress to establish them has been very limited. This paper aims to study the above folklore theorems and establish their scope of validity. Starting with approximate message passing algorithm as a heuristic method for solving LPLS, we study the impact of initialization on the performance of AMP. Then, we employ the replica analysis to show the connection between the solution of AMP and $\hat{x}(\gamma, p)$ in the asymptotic settings. This enables us to compare the accuracy of $\hat{x}(\gamma,p)$ for $p \in [0,1]$. In particular, we will characterize the phase transition and noise sensitivity of LPLS for every $0\leq p\leq 1$ accurately. Our results in the noiseless setting confirm that LPLS exhibits the same phase transition for every $0\leq p <1$ and this phase transition is much higher than that of LASSO.