
We study the problem of variable selection for linear models under the
highdimensional asymptotic setting, where the number of observations $n$ grows
at the same rate as the number of predictors $p$. We consider twostage
variable selection techniques (TVS) in which the first stage uses bridge
estimators to obtain an estimate of the regression coefficients, and the second
stage simply thresholds this estimate to select the "important" predictors. The
asymptotic false discovery proportion (AFDP) and true positive proportion
(ATPP) of these TVS are evaluated. We prove that for a fixed ATPP, in order to
obtain a smaller AFDP, one should pick a bridge estimator with smaller
asymptotic mean square error in the first stage of TVS. Based on such
principled discovery, we present a sharp comparison of different TVS, via an
indepth investigation of the estimation properties of bridge estimators.
Rather than "orderwise" error bounds with loose constants, our analysis
focuses on precise error characterization. Various interesting signaltonoise
ratio and sparsity settings are studied. Our results offer new and thorough
insights into highdimensional variable selection. For instance, we prove that
a TVS with Ridge in its first stage outperforms TVS with other bridge
estimators in large noise settings; twostage LASSO becomes inferior when the
signal is rare and weak. As a byproduct, we show that twostage methods
outperform some standard variable selection techniques, such as LASSO and Sure
Independence Screening, under certain conditions.

The class of Lqregularized least squares (LQLS) are considered for
estimating a pdimensional vector \b{eta} from its n noisy linear observations
y = X\b{eta}+w. The performance of these schemes are studied under the
highdimensional asymptotic setting in which p grows linearly with n. In this
asymptotic setting, phase transition diagrams (PT) are often used for comparing
the performance of different estimators. Although phase transition analysis is
shown to provide useful information for compressed sensing, the fact that it
ignores the measurement noise not only limits its applicability in many
application areas, but also may lead to misunderstandings. For instance,
consider a linear regression problem in which n > p and the signal is not
exactly sparse. If the measurement noise is ignored in such systems,
regularization techniques, such as LQLS, seem to be irrelevant since even the
ordinary least squares (OLS) returns the exact solution. However, it is
wellknown that if n is not much larger than p then the regularization
techniques improve the performance of OLS. In response to this limitation of PT
analysis, we consider the lownoise sensitivity analysis. We show that this
analysis framework (i) reveals the advantage of LQLS over OLS, (ii) captures
the difference between different LQLS estimators even when n > p, and (iii)
provides a fair comparison among different estimators in high signaltonoise
ratios. As an application of this framework, we will show that under mild
conditions LASSO outperforms other LQLS even when the signal is dense. Finally,
by a simple transformation we connect our lownoise sensitivity framework to
the classical asymptotic regime in which n/p goes to infinity and characterize
how and when regularization techniques offer improvements over ordinary least
squares, and which regularizer gives the most improvement when the sample size
is large.

In this paper, we study the popularly dubbed matrix completion problem, where
the task is to "fill in" the unobserved entries of a matrix from a small subset
of observed entries, under the assumption that the underlying matrix is of
lowrank. Our contributions herein, enhance our prior work on nuclear norm
regularized problems for matrix completion (Mazumder et al., 2010) by
incorporating a continuum of nonconvex penalty functions between the convex
nuclear norm and nonconvex rank functions. Inspired by SOFTIMPUTE (Mazumder et
al., 2010; Hastie et al., 2016), we propose NCIMPUTE an EMflavored
algorithmic framework for computing a family of nonconvex penalized matrix
completion problems with warmstarts. We present a systematic study of the
associated spectral thresholding operators, which play an important role in the
overall algorithm. We study convergence properties of the algorithm. Using
structured lowrank SVD computations, we demonstrate the computational
scalability of our proposal for problems up to the Netflix size (approximately,
a $500,000 \times 20, 000$ matrix with $10^8$ observed entries). We demonstrate
that on a wide range of synthetic and real data instances, our proposed
nonconvex regularization framework leads to lowrank solutions with better
predictive performance when compared to those obtained from nuclear norm
problems. Implementations of algorithms proposed herein, written in the R
programming language, are made available on github.

We consider a probit model without covariates, but the latent Gaussian
variables having compound symmetry covariance structure with a single parameter
characterizing the common correlation. We study the parameter estimation
problem under such oneparameter probit models. As a surprise, we demonstrate
that the likelihood function does not yield consistent estimates for the
correlation. We then formally prove the parameter's nonestimability by deriving
a nonvanishing minimax lower bound. This counterintuitive phenomenon provides
an interesting insight that one bit information of the latent Gaussian
variables is not sufficient to consistently recover their correlation. On the
other hand, we further show that trinary data generated from the Gaussian
variables can consistently estimate the correlation with parametric convergence
rate. Hence we reveal a phase transition phenomenon regarding the
discretization of latent Gaussian variables while preserving the estimability
of the correlation.

We study the problem of estimating $\beta \in \mathbb{R}^p$ from its noisy
linear observations $y= X\beta+ w$, where $w \sim N(0, \sigma_w^2 I_{n\times
n})$, under the following highdimensional asymptotic regime: given a fixed
number $\delta$, $p \rightarrow \infty$, while $n/p \rightarrow \delta$. We
consider the popular class of $\ell_q$regularized least squares (LQLS)
estimators, a.k.a. bridge, given by the optimization problem: \begin{equation*}
\hat{\beta} (\lambda, q ) \in \arg\min_\beta \frac{1}{2} \yX\beta\_2^2+
\lambda \\beta\_q^q, \end{equation*} and characterize the almost sure limit
of $\frac{1}{p} \\hat{\beta} (\lambda, q ) \beta\_2^2$. The expression we
derive for this limit does not have explicit forms and hence are not useful in
comparing different algorithms, or providing information in evaluating the
effect of $\delta$ or sparsity level of $\beta$. To simplify the expressions,
researchers have considered the ideal "nonoise" regime and have characterized
the values of $\delta$ for which the almost sure limit is zero. This is known
as the phase transition analysis.
In this paper, we first perform the phase transition analysis of LQLS. Our
results reveal some of the limitations and misleading features of the phase
transition analysis. To overcome these limitations, we propose the study of
these algorithms under the low noise regime. Our new analysis framework not
only sheds light on the results of the phase transition analysis, but also
makes an accurate comparison of different regularizers possible.

In ultrahigh dimensional setting, independence screening has been both
theoretically and empirically proved a useful variable selection framework with
low computation cost. In this work, we propose a twostep framework by using
marginal information in a different perspective from independence screening. In
particular, we retain significant variables rather than screening out
irrelevant ones. The new method is shown to be model selection consistent in
the ultrahigh dimensional linear regression model. To improve the finite sample
performance, we then introduce a threestep version and characterize its
asymptotic behavior. Simulations and real data analysis show advantages of our
method over independence screening and its iterative variants in certain
regimes.

Community detection is one of the fundamental problems in the study of
network data. Most existing community detection approaches only consider edge
information as inputs, and the output could be suboptimal when nodal
information is available. In such cases, it is desirable to leverage nodal
information for the improvement of community detection accuracy. Towards this
goal, we propose a flexible network model incorporating nodal information, and
develop likelihoodbased inference methods. For the proposed methods, we
establish favorable asymptotic properties as well as efficient algorithms for
computation. Numerical experiments show the effectiveness of our methods in
utilizing nodal information across a variety of simulated and real network data
sets.

In many application areas we are faced with the following question: Can we
recover a sparse vector $x_o \in \mathbb{R}^N$ from its undersampled set of
noisy observations $y \in \mathbb{R}^n$, $y=A x_o+w$. The last decade has
witnessed a surge of algorithms and theoretical results addressing this
question. One of the most popular algorithms is the $\ell_p$regularized least
squares (LPLS) given by the following formulation: \[ \hat{x}(\gamma,p )\in
\arg\min_x \frac{1}{2}\y  Ax\_2^2+\gamma\x\_p^p, \] where $p \in [0,1]$.
Despite the nonconvexity of these problems for $p<1$, they are still appealing
because of the following folklores in compressed sensing: (i) $\hat{x}(\gamma,p
)$ is closer to $x_o$ than $\hat{x}(\gamma,1)$. (ii) If we employ iterative
methods that aim to converge to a local minima of LPLS, then under good
initialization these algorithms converge to a solution that is closer to $x_o$
than $\hat{x}(\gamma,1)$. In spite of the existence of plenty of empirical
results that support these folklore theorems, the theoretical progress to
establish them has been very limited.
This paper aims to study the above folklore theorems and establish their
scope of validity. Starting with approximate message passing algorithm as a
heuristic method for solving LPLS, we study the impact of initialization on the
performance of AMP. Then, we employ the replica analysis to show the connection
between the solution of AMP and $\hat{x}(\gamma, p)$ in the asymptotic
settings. This enables us to compare the accuracy of $\hat{x}(\gamma,p)$ for $p
\in [0,1]$. In particular, we will characterize the phase transition and noise
sensitivity of LPLS for every $0\leq p\leq 1$ accurately. Our results in the
noiseless setting confirm that LPLS exhibits the same phase transition for
every $0\leq p <1$ and this phase transition is much higher than that of LASSO.