
We study the problem of variable selection for linear models under the
highdimensional asymptotic setting, where the number of observations $n$ grows
at the same rate as the number of predictors $p$. We consider twostage
variable selection techniques (TVS) in which the first stage uses bridge
estimators to obtain an estimate of the regression coefficients, and the second
stage simply thresholds this estimate to select the "important" predictors. The
asymptotic false discovery proportion (AFDP) and true positive proportion
(ATPP) of these TVS are evaluated. We prove that for a fixed ATPP, in order to
obtain a smaller AFDP, one should pick a bridge estimator with smaller
asymptotic mean square error in the first stage of TVS. Based on such
principled discovery, we present a sharp comparison of different TVS, via an
indepth investigation of the estimation properties of bridge estimators.
Rather than "orderwise" error bounds with loose constants, our analysis
focuses on precise error characterization. Various interesting signaltonoise
ratio and sparsity settings are studied. Our results offer new and thorough
insights into highdimensional variable selection. For instance, we prove that
a TVS with Ridge in its first stage outperforms TVS with other bridge
estimators in large noise settings; twostage LASSO becomes inferior when the
signal is rare and weak. As a byproduct, we show that twostage methods
outperform some standard variable selection techniques, such as LASSO and Sure
Independence Screening, under certain conditions.

We consider an $\ell_2$regularized nonconvex optimization problem for
recovering signals from their noisy phaseless observations. We design and study
the performance of a message passing algorithm that aims to solve this
optimization problem. We consider the asymptotic setting $m,n \rightarrow
\infty$, $m/n \rightarrow \delta$ and obtain sharp performance bounds, where
$m$ is the number of measurements and $n$ is the signal dimension. We show that
for complex signals the algorithm can perform accurate recovery with only $m=
\left(\frac{64}{\pi^2}4\right)n \approx 2.5n$ measurements. Also, we provide
sharp analysis on the sensitivity of the algorithm to noise. We highlight the
following facts about our message passing algorithm: (i) Adding $\ell_2$
regularization to the nonconvex loss function can be beneficial. (ii) Spectral
initialization has marginal impact on the performance of the algorithm. The
sharp analyses in this paper, not only enable us to compare the performance of
our method with other phase recovery schemes, but also shed light on designing
better iterative algorithms for other nonconvex optimization problems.

The class of Lqregularized least squares (LQLS) are considered for
estimating a pdimensional vector \b{eta} from its n noisy linear observations
y = X\b{eta}+w. The performance of these schemes are studied under the
highdimensional asymptotic setting in which p grows linearly with n. In this
asymptotic setting, phase transition diagrams (PT) are often used for comparing
the performance of different estimators. Although phase transition analysis is
shown to provide useful information for compressed sensing, the fact that it
ignores the measurement noise not only limits its applicability in many
application areas, but also may lead to misunderstandings. For instance,
consider a linear regression problem in which n > p and the signal is not
exactly sparse. If the measurement noise is ignored in such systems,
regularization techniques, such as LQLS, seem to be irrelevant since even the
ordinary least squares (OLS) returns the exact solution. However, it is
wellknown that if n is not much larger than p then the regularization
techniques improve the performance of OLS. In response to this limitation of PT
analysis, we consider the lownoise sensitivity analysis. We show that this
analysis framework (i) reveals the advantage of LQLS over OLS, (ii) captures
the difference between different LQLS estimators even when n > p, and (iii)
provides a fair comparison among different estimators in high signaltonoise
ratios. As an application of this framework, we will show that under mild
conditions LASSO outperforms other LQLS even when the signal is dense. Finally,
by a simple transformation we connect our lownoise sensitivity framework to
the classical asymptotic regime in which n/p goes to infinity and characterize
how and when regularization techniques offer improvements over ordinary least
squares, and which regularizer gives the most improvement when the sample size
is large.

We propose a scalable closedform formula ($\text{ALO}_{\lambda}$) to
estimate the extrasample prediction error of regularized estimators. Our
approach employs existing heuristic arguments to approximate the leaveoneout
perturbations. We theoretically prove the accuracy of $\text{ALO}_{\lambda}$ in
the highdimensional setting where the number of predictors is proportional to
the number of observations. We show how this approach can be applied to popular
nondifferentiable regularizers, such as LASSO, and compare its results with
other popular risk estimation techniques, such as Stein's unbiased risk
estimate (SURE). Our theoretical findings are illustrated using simulations and
real recordings from spatially sensitive neurons (grid cells) in the medial
entorhinal cortex of a rat.

We study the problem of estimating $\beta \in \mathbb{R}^p$ from its noisy
linear observations $y= X\beta+ w$, where $w \sim N(0, \sigma_w^2 I_{n\times
n})$, under the following highdimensional asymptotic regime: given a fixed
number $\delta$, $p \rightarrow \infty$, while $n/p \rightarrow \delta$. We
consider the popular class of $\ell_q$regularized least squares (LQLS)
estimators, a.k.a. bridge, given by the optimization problem: \begin{equation*}
\hat{\beta} (\lambda, q ) \in \arg\min_\beta \frac{1}{2} \yX\beta\_2^2+
\lambda \\beta\_q^q, \end{equation*} and characterize the almost sure limit
of $\frac{1}{p} \\hat{\beta} (\lambda, q ) \beta\_2^2$. The expression we
derive for this limit does not have explicit forms and hence are not useful in
comparing different algorithms, or providing information in evaluating the
effect of $\delta$ or sparsity level of $\beta$. To simplify the expressions,
researchers have considered the ideal "nonoise" regime and have characterized
the values of $\delta$ for which the almost sure limit is zero. This is known
as the phase transition analysis.
In this paper, we first perform the phase transition analysis of LQLS. Our
results reveal some of the limitations and misleading features of the phase
transition analysis. To overcome these limitations, we propose the study of
these algorithms under the low noise regime. Our new analysis framework not
only sheds light on the results of the phase transition analysis, but also
makes an accurate comparison of different regularizers possible.

This paper deals with linear equalization in massive multiuser
multipleinput multipleoutput (MUMIMO) wireless systems. We first provide
simple conditions on the antenna configuration for which the wellknown linear
minimum meansquare error (LMMSE) equalizer provides nearoptimal spectral
efficiency, and we analyze its performance in the presence of parameter
mismatches in the signal and/or noise powers. We then propose a novel,
optimallytuned NOnParametric Equalizer (NOPE) for massive MUMIMO systems,
which avoids knowledge of the transmit signal and noise powers altogether. We
show that NOPE achieves the same performance as that of the LMMSE equalizer in
the largeantenna limit, and we demonstrate its efficacy in realistic,
finitedimensional systems. From a practical perspective, NOPE is
computationally efficient and avoids dedicated training that is typically
required for parameter estimation

Modern image and video compression codes employ elaborate structures existing
in such signals to encode them into few number of bits. Compressed sensing
recovery algorithms on the other hand use such signals' structures to recover
them from few linear observations. Despite the steady progress in the field of
compressed sensing, structures that are often used for signal recovery are
still much simpler than those employed by stateoftheart compression codes.
The main goal of this paper is to bridge this gap through answering the
following question: Can one employ a given compression code to build an
efficient (polynomial time) compressed sensing recovery algorithm? In response
to this question, the compressionbased gradient descent (CGD) algorithm is
proposed. CGD, which is a lowcomplexity iterative algorithm, is able to
employ a generic compression code for compressed sensing and therefore elevates
the scope of structures used in compressed sensing to those used by compression
codes. The convergence performance of CGD and its required number of
measurements in terms of the ratedistortion performance of the compression
code are theoretically analyzed. It is also shown that CGD is robust to
additive white Gaussian noise. Finally, the presented simulation results show
that combining CGD with commercial image compression codes such as JPEG2000
yields stateoftheart performance in imaging applications.

Consider the problem of estimating parameters $X^n \in \mathbb{R}^n $,
generated by a stationary process, from $m$ response variables $Y^m =
AX^n+Z^m$, under the assumption that the distribution of $X^n$ is known. This
is the most general version of the Bayesian linear regression problem. The lack
of computationally feasible algorithms that can employ generic prior
distributions and provide a good estimate of $X^n$ has limited the set of
distributions researchers use to model the data. In this paper, a new scheme
called QMAP is proposed. The new method has the following properties: (i) It
has similarities to the popular MAP estimation under the noiseless setting.
(ii) In the noiseless setting, it achieves the "asymptotically optimal
performance" when $X^n$ has independent and identically distributed components.
(iii) It scales favorably with the dimensions of the problem and therefore is
applicable to highdimensional setups. (iv) The solution of the QMAP
optimization can be found via a proposed iterative algorithm which is provably
robust to the error (noise) in the response variables.

Let $ K(X_1, \ldots, X_n)$ and $H(X_n  X_{n1}, \ldots, X_1)$ denote the
Kolmogorov complexity and Shannon's entropy rate of a stationary and ergodic
process $\{X_i\}_{i=\infty}^\infty$. It has been proved that \[ \frac{K(X_1,
\ldots, X_n)}{n}  H(X_n  X_{n1}, \ldots, X_1) \rightarrow 0, \] almost
surely. This paper studies the convergence rate of this asymptotic result. In
particular, we show that if the process satisfies certain mixing conditions,
then there exists $\sigma<\infty$ such that
$$\sqrt{n}\left(\frac{K(X_{1:n})}{n} H(X_0X_1,\dots,X_{\infty})\right)
\rightarrow_d N(0,\sigma^2).$$ Furthermore, we show that under slightly
stronger mixing conditions one may obtain nonasymptotic concentration bounds
for the Kolmogorov complexity.

Expectation Maximization (EM) is among the most popular algorithms for
estimating parameters of statistical models. However, EM, which is an iterative
algorithm based on the maximum likelihood principle, is generally only
guaranteed to find stationary points of the likelihood objective, and these
points may be far from any maximizer. This article addresses this disconnect
between the statistical principles behind EM and its algorithmic properties.
Specifically, it provides a global analysis of EM for specific models in which
the observations comprise an i.i.d. sample from a mixture of two Gaussians.
This is achieved by (i) studying the sequence of parameters from idealized
execution of EM in the infinite sample limit, and fully characterizing the
limit points of the sequence in terms of the initial parameters; and then (ii)
based on this convergence analysis, establishing statistical consistency (or
lack thereof) for the actual sequence of parameters produced by EM.

We investigate the performance of mismatched data detection in large
multipleinput multipleoutput (MIMO) systems, where the prior distribution of
the transmit signal used in the data detector differs from the true prior. To
minimize the performance loss caused by this prior mismatch, we include a
tuning stage into our recentlyproposed large MIMO approximate message passing
(LAMA) algorithm, which allows us to develop mismatched LAMA algorithms with
optimal as well as suboptimal tuning. We show that carefullyselected priors
often enable simpler and computationally more efficient algorithms compared to
LAMA with the true prior while achieving nearoptimal performance. A
performance analysis of our algorithms for a Gaussian prior and a uniform prior
within a hypercube covering the QAM constellation recovers classical and recent
results on linear and nonlinear MIMO data detection, respectively.

In many application areas we are faced with the following question: Can we
recover a sparse vector $x_o \in \mathbb{R}^N$ from its undersampled set of
noisy observations $y \in \mathbb{R}^n$, $y=A x_o+w$. The last decade has
witnessed a surge of algorithms and theoretical results addressing this
question. One of the most popular algorithms is the $\ell_p$regularized least
squares (LPLS) given by the following formulation: \[ \hat{x}(\gamma,p )\in
\arg\min_x \frac{1}{2}\y  Ax\_2^2+\gamma\x\_p^p, \] where $p \in [0,1]$.
Despite the nonconvexity of these problems for $p<1$, they are still appealing
because of the following folklores in compressed sensing: (i) $\hat{x}(\gamma,p
)$ is closer to $x_o$ than $\hat{x}(\gamma,1)$. (ii) If we employ iterative
methods that aim to converge to a local minima of LPLS, then under good
initialization these algorithms converge to a solution that is closer to $x_o$
than $\hat{x}(\gamma,1)$. In spite of the existence of plenty of empirical
results that support these folklore theorems, the theoretical progress to
establish them has been very limited.
This paper aims to study the above folklore theorems and establish their
scope of validity. Starting with approximate message passing algorithm as a
heuristic method for solving LPLS, we study the impact of initialization on the
performance of AMP. Then, we employ the replica analysis to show the connection
between the solution of AMP and $\hat{x}(\gamma, p)$ in the asymptotic
settings. This enables us to compare the accuracy of $\hat{x}(\gamma,p)$ for $p
\in [0,1]$. In particular, we will characterize the phase transition and noise
sensitivity of LPLS for every $0\leq p\leq 1$ accurately. Our results in the
noiseless setting confirm that LPLS exhibits the same phase transition for
every $0\leq p <1$ and this phase transition is much higher than that of LASSO.

A denoising algorithm seeks to remove noise, errors, or perturbations from a
signal. Extensive research has been devoted to this arena over the last several
decades, and as a result, today's denoisers can effectively remove large
amounts of additive white Gaussian noise. A compressed sensing (CS)
reconstruction algorithm seeks to recover a structured signal acquired using a
small number of randomized measurements. Typical CS reconstruction algorithms
can be cast as iteratively estimating a signal from a perturbed observation.
This paper answers a natural question: How can one effectively employ a generic
denoiser in a CS reconstruction algorithm? In response, we develop an extension
of the approximate message passing (AMP) framework, called Denoisingbased AMP
(DAMP), that can integrate a wide class of denoisers within its iterations. We
demonstrate that, when used with a high performance denoiser for natural
images, DAMP offers stateoftheart CS recovery performance while operating
tens of times faster than competing methods. We explain the exceptional
performance of DAMP by analyzing some of its theoretical features. A key
element in DAMP is the use of an appropriate Onsager correction term in its
iterations, which coerces the signal perturbation at each iteration to be very
close to the white Gaussian noise that denoisers are typically designed to
remove.

We consider the problem of recovering a vector $\beta_o \in \mathbb{R}^p$
from $n$ random and noisy linear observations $y= X\beta_o + w$, where $X$ is
the measurement matrix and $w$ is noise. The LASSO estimate is given by the
solution to the optimization problem $\hat{\beta}_{\lambda} = \arg \min_{\beta}
\frac{1}{2} \yX\beta\_2^2 + \lambda \ \beta \_1$. Among the iterative
algorithms that have been proposed for solving this optimization problem,
approximate message passing (AMP) has attracted attention for its fast
convergence. Despite significant progress in the theoretical analysis of the
estimates of LASSO and AMP, little is known about their behavior as a function
of the regularization parameter $\lambda$, or the thereshold parameters
$\tau^t$. For instance the following basic questions have not yet been studied
in the literature: (i) How does the size of the active set
$\\hat{\beta}^\lambda\_0/p$ behave as a function of $\lambda$? (ii) How does
the mean square error $\\hat{\beta}_{\lambda}  \beta_o\_2^2/p$ behave as a
function of $\lambda$? (iii) How does $\\beta^t  \beta_o \_2^2/p$ behave as
a function of $\tau^1, \ldots, \tau^{t1}$? Answering these questions will help
in addressing practical challenges regarding the optimal tuning of $\lambda$ or
$\tau^1, \tau^2, \ldots$. This paper answers these questions in the asymptotic
setting and shows how these results can be employed in deriving simple and
theoretically optimal approaches for tuning the parameters $\tau^1, \ldots,
\tau^t$ for AMP or $\lambda$ for LASSO. It also explores the connection between
the optimal tuning of the parameters of AMP and the optimal tuning of LASSO.

Optimal data detection in multipleinput multipleoutput (MIMO) communication
systems with a large number of antennas at both ends of the wireless link
entails prohibitive computational complexity. In order to reduce the
computational complexity, a variety of suboptimal detection algorithms have
been proposed in the literature. In this paper, we analyze the optimality of a
novel datadetection method for large MIMO systems that relies on approximate
message passing (AMP). We show that our algorithm, referred to as
individuallyoptimal (IO) largeMIMO AMP (short IOLAMA), is able to perform IO
data detection given certain conditions on the MIMO system and the
constellation set (e.g., QAM or PSK) are met.

Realworld transceiver designs for multipleinput multipleoutput (MIMO)
wireless communication systems are affected by a number of hardware impairments
that already appear at the transmit side, such as amplifier nonlinearities,
quantization artifacts, and phase noise. While such transmitside impairments
are routinely ignored in the datadetection literature, they often limit
reliable communication in practical systems. In this paper, we present a novel
datadetection algorithm, referred to as largeMIMO approximate message passing
with transmit impairments (short LAMAI), which takes into account a broad
range of transmitside impairments in wireless systems with a large number of
transmit and receive antennas. We provide conditions in the largesystem limit
for which LAMAI achieves the errorrate performance of the
individuallyoptimal (IO) data detector. We furthermore demonstrate that LAMAI
achieves nearIO performance at low computational complexity in realistic,
finite dimensional largeMIMO systems.

Iterative thresholding algorithms are wellsuited for highdimensional
problems in sparse recovery and compressive sensing. The performance of this
class of algorithms depends heavily on the tuning of certain threshold
parameters. In particular, both the final reconstruction error and the
convergence rate of the algorithm crucially rely on how the threshold parameter
is set at each step of the algorithm. In this paper, we propose a
parameterfree approximate message passing (AMP) algorithm that sets the
threshold parameter at each iteration in a fully automatic way without either
having an information about the signal to be reconstructed or needing any
tuning from the user. We show that the proposed method attains both the minimum
reconstruction error and the highest convergence rate. Our method is based on
applying the Stein unbiased risk estimate (SURE) along with a modified gradient
descent to find the optimal threshold in each iteration. Motivated by the
connections between AMP and LASSO, it could be employed to find the solution of
the LASSO for the optimal regularization parameter. To the best of our
knowledge, this is the first work concerning parameter tuning that obtains the
fastest convergence rate with theoretical guarantees.

This paper concerns the performance of the LASSO (also knows as basis pursuit
denoising) for recovering sparse signals from undersampled, randomized, noisy
measurements. We consider the recovery of the signal $x_o \in \mathbb{R}^N$
from $n$ random and noisy linear observations $y= Ax_o + w$, where $A$ is the
measurement matrix and $w$ is the noise. The LASSO estimate is given by the
solution to the optimization problem $x_o$ with $\hat{x}_{\lambda} = \arg
\min_x \frac{1}{2} \yAx\_2^2 + \lambda \x\_1$. Despite major progress in
the theoretical analysis of the LASSO solution, little is known about its
behavior as a function of the regularization parameter $\lambda$. In this paper
we study two questions in the asymptotic setting (i.e., where $N \rightarrow
\infty$, $n \rightarrow \infty$ while the ratio $n/N$ converges to a fixed
number in $(0,1)$): (i) How does the size of the active set
$\\hat{x}_\lambda\_0/N$ behave as a function of $\lambda$, and (ii) How does
the mean square error $\\hat{x}_{\lambda}  x_o\_2^2/N$ behave as a function
of $\lambda$? We then employ these results in a new, reliable algorithm for
solving LASSO based on approximate message passing (AMP).

Can compression algorithms be employed for recovering signals from their
underdetermined set of linear measurements? Addressing this question is the
first step towards applying compression algorithms for compressed sensing (CS).
In this paper, we consider a family of compression algorithms $\mathcal{C}_r$,
parametrized by rate $r$, for a compact class of signals $\mathcal{Q} \subset
\mathds{R}^n$. The set of natural images and JPEG at different rates are
examples of $\mathcal{Q}$ and $\mathcal{C}_r$, respectively. We establish a
connection between the ratedistortion performance of $\mathcal{C}_r$, and the
number of linear measurements required for successful recovery in CS. We then
propose compressible signal pursuit (CSP) algorithm and prove that, with high
probability, it accurately and robustly recovers signals from an
underdetermined set of linear measurements. We also explore the performance of
CSP in the recovery of infinite dimensional signals.

The nascent field of compressed sensing is founded on the fact that
highdimensional signals with "simple structure" can be recovered accurately
from just a small number of randomized samples. Several specific kinds of
structures have been explored in the literature, from sparsity and group
sparsity to lowrankness. However, two fundamental questions have been left
unanswered, namely: What are the general abstract meanings of "structure" and
"simplicity"? And do there exist universal algorithms for recovering such
simple structured objects from fewer samples than their ambient dimension? In
this paper, we address these two questions. Using algorithmic information
theory tools such as the Kolmogorov complexity, we provide a unified definition
of structure and simplicity. Leveraging this new definition, we develop and
analyze an abstract algorithm for signal recovery motivated by Occam's
Razor.Minimum complexity pursuit (MCP) requires just O(3\kappa) randomized
samples to recover a signal of complexity \kappa and ambient dimension n. We
also discuss the performance of MCP in the presence of measurement noise and
with approximately simple signals.

We consider the problem of recovering a block (or group) sparse signal from
an underdetermined set of random linear measurements, which appear in
compressed sensing applications such as radar and imaging. Recent results of
Donoho, Johnstone, and Montanari have shown that approximate message passing
(AMP) in combination with Stein's shrinkage outperforms group LASSO for large
block sizes. In this paper, we prove that, for a fixed block size and in the
strong undersampling regime (i.e., having very few measurements compared to the
ambient dimension), AMP cannot improve upon group LASSO, thereby complementing
the results of Donoho et al.

Recovering a sparse signal from an undersampled set of random linear
measurements is the main problem of interest in compressed sensing. In this
paper, we consider the case where both the signal and the measurements are
complex. We study the popular reconstruction method of $\ell_1$regularized
least squares or LASSO. While several studies have shown that the LASSO
algorithm offers desirable solutions under certain conditions, the precise
asymptotic performance of this algorithm in the complex setting is not yet
known. In this paper, we extend the approximate message passing (AMP) algorithm
to the complex signals and measurements and obtain the complex approximate
message passing algorithm (CAMP). We then generalize the state evolution
framework recently introduced for the analysis of AMP, to the complex setting.
Using the state evolution, we derive accurate formulas for the phase transition
and noise sensitivity of both LASSO and CAMP.

It has recently been proved that the popular nonlocal means (NLM) denoising
algorithm does not optimally denoise images with sharp edges. Its weakness lies
in the isotropic nature of the neighborhoods it uses to set its smoothing
weights. In response, in this paper we introduce several theoretical and
practical anisotropic nonlocal means (ANLM) algorithms and prove that they are
near minimax optimal for edgedominated images from the Horizon class. On
realworld test images, an ANLM algorithm that adapts to the underlying image
gradients outperforms NLM by a significant margin.

The L1regularized maximum likelihood estimation problem has recently become
a topic of great interest within the machine learning, statistics, and
optimization communities as a method for producing sparse inverse covariance
estimators. In this paper, a proximal gradient method (GISTA) for performing
L1regularized covariance matrix estimation is presented. Although numerous
algorithms have been proposed for solving this problem, this simple proximal
gradient method is found to have attractive theoretical and numerical
properties. GISTA has a linear rate of convergence, resulting in an O(log e)
iteration complexity to reach a tolerance of e. This paper gives eigenvalue
bounds for the GISTA iterates, providing a closedform linear convergence
rate. The rate is shown to be closely related to the condition number of the
optimal point. Numerical convergence results and timing comparisons for the
proposed method are presented. GISTA is shown to perform very well, especially
when the optimal point is wellconditioned.

A host of problems involve the recovery of structured signals from a
dimensionality reduced representation such as a random projection; examples
include sparse signals (compressive sensing) and lowrank matrices (matrix
completion). Given the wide range of different recovery algorithms developed to
date, it is natural to ask whether there exist "universal" algorithms for
recovering "structured" signals from their linear projections. We recently
answered this question in the affirmative in the noisefree setting. In this
paper, we extend our results to the case of noisy measurements.