
Lowrank modeling plays a pivotal role in signal processing and machine
learning, with applications ranging from collaborative filtering, video
surveillance, medical imaging, to dimensionality reduction and adaptive
filtering. Many modern highdimensional data and interactions thereof can be
modeled as lying approximately in a lowdimensional subspace or manifold,
possibly with additional structures, and its proper exploitations lead to
significant reduction of costs in sensing, computation and storage. In recent
years, there is a plethora of progress in understanding how to exploit lowrank
structures using computationally efficient procedures in a provable manner,
including both convex and nonconvex approaches. On one side, convex relaxations
such as nuclear norm minimization often lead to statistically optimal
procedures for estimating lowrank matrices, where firstorder methods are
developed to address the computational challenges; on the other side, there is
emerging evidence that properly designed nonconvex procedures, such as
projected gradient descent, often provide globally optimal solutions with a
much lower computational cost in many problems. This survey article will
provide a unified overview of these recent advances on lowrank matrix
estimation from incomplete measurements. Attention is paid to rigorous
characterization of the performance of these algorithms, and to problems where
the lowrank matrix have additional structural properties that require new
algorithmic designs and theoretical analysis.

Efficient estimation of wideband spectrum is of great importance for
applications such as cognitive radio. Recently, subNyquist sampling schemes
based on compressed sensing have been proposed to greatly reduce the sampling
rate. However, the important issue of quantization has not been fully
addressed, particularly for highresolution spectrum and parameter estimation.
In this paper, we aim to recover spectrallysparse signals and the
corresponding parameters, such as frequency and amplitudes, from heavy
quantizations of their noisy complexvalued random linear measurements, e.g.
only the quadrant information. We first characterize the CramerRao bound under
Gaussian noise, which highlights the tradeoff between sample complexity and
bit depth under different signaltonoise ratios for a fixed budget of bits.
Next, we propose a new algorithm based on atomic norm soft thresholding for
signal recovery, which is equivalent to proximal mapping of properly designed
surrogate signals with respect to the atomic norm that motivates spectral
sparsity. The proposed algorithm can be applied to both the single measurement
vector case, as well as the multiple measurement vector case. It is shown that
under the Gaussian measurement model, the spectral signals can be reconstructed
accurately with high probability, as soon as the number of quantized
measurements exceeds the order of K log n, where K is the level of spectral
sparsity and $n$ is the signal dimension. Finally, numerical simulations are
provided to validate the proposed approaches.

This paper considers the problem of solving systems of quadratic equations,
namely, recovering an object of interest
$\mathbf{x}^{\natural}\in\mathbb{R}^{n}$ from $m$ quadratic equations/samples
$y_{i}=(\mathbf{a}_{i}^{\top}\mathbf{x}^{\natural})^{2}$, $1\leq i\leq m$. This
problem, also dubbed as phase retrieval, spans multiple domains including
physical sciences and machine learning.
We investigate the efficiency of gradient descent (or Wirtinger flow)
designed for the nonconvex least squares problem. We prove that under Gaussian
designs, gradient descent  when randomly initialized  yields an
$\epsilon$accurate solution in $O\big(\log n+\log(1/\epsilon)\big)$ iterations
given nearly minimal samples, thus achieving nearoptimal computational and
sample complexities at once. This provides the first global convergence
guarantee concerning vanilla gradient descent for phase retrieval, without the
need of (i) carefullydesigned initialization, (ii) sample splitting, or (iii)
sophisticated saddlepoint escaping schemes. All of these are achieved by
exploiting the statistical models in analyzing optimization algorithms, via a
leaveoneout approach that enables the decoupling of certain statistical
dependency between the gradient descent iterates and the data.

We study the local geometry of a onehiddenlayer fullyconnected neural
network where the training samples are generated from a multineuron logistic
regression model. We prove that under Gaussian input, the empirical risk
function employing quadratic loss exhibits strong convexity and smoothness
uniformly in a local neighborhood of the ground truth, for a class of smooth
activation functions satisfying certain properties, including sigmoid and tanh,
as soon as the sample complexity is sufficiently large. This implies that if
initialized in this neighborhood, gradient descent converges linearly to a
critical point that is provably close to the ground truth without requiring a
fresh set of samples at each iteration. This significantly improves upon prior
results on learning shallow neural networks with multiple neurons. To the best
of our knowledge, this is the first global convergence guarantee for
onehiddenlayer neural networks using gradient descent over the empirical risk
function without resampling at the nearoptimal sampling and computational
complexity.

We consider the problem of recovering lowrank matrices from random rankone
measurements, which spans numerous applications including covariance sketching,
phase retrieval, quantum state tomography, and learning shallow polynomial
neural networks, among others. Our approach is to directly estimate the
lowrank factor by minimizing a nonconvex quadratic loss function via vanilla
gradient descent, following a tailored spectral initialization. When the true
rank is small, this algorithm is guaranteed to converge to the ground truth (up
to global ambiguity) with nearoptimal sample complexity and computational
complexity. To the best of our knowledge, this is the first guarantee that
achieves nearoptimality in both metrics. In particular, the key enabler of
nearoptimal computational guarantees is an implicit regularization phenomenon:
without explicit regularization, both spectral initialization and the gradient
descent iterates automatically stay within a region incoherent with the
measurement vectors. This feature allows one to employ much more aggressive
step sizes compared with the ones suggested in prior literature, without the
need of sample splitting.

Lowrank matrix completion (MC) has achieved great success in many realworld
data applications. A latent feature model formulation is usually employed and,
to improve prediction performance, the similarities between latent variables
can be exploited by pairwise learning, e.g., the graph regularized matrix
factorization (GRMF) method. However, existing GRMF approaches often use a
squared L2 norm to measure the pairwise difference, which may be overly
influenced by dissimilar pairs and lead to inferior prediction. To fully
empower pairwise learning for matrix completion, we propose a general
optimization framework that allows a rich class of (non)convex pairwise
penalty functions. A new and efficient algorithm is further developed to
uniformly solve the optimization problem, with a theoretical convergence
guarantee. In an important situation where the latent variables form a small
number of subgroups, its statistical guarantee is also fully characterized. In
particular, we theoretically characterize the complexityregularized maximum
likelihood estimator, as a special case of our framework. It has a better error
bound when compared to the standard tracenorm regularized matrix completion.
We conduct extensive experiments on both synthetic and real datasets to
demonstrate the superior performance of this general framework.

Recent work has demonstrated the effectiveness of gradient descent for
directly recovering the factors of lowrank matrices from random linear
measurements in a globally convergent manner when initialized properly.
However, the performance of existing algorithms is highly sensitive in the
presence of outliers that may take arbitrary values. In this paper, we propose
a truncated gradient descent algorithm to improve the robustness against
outliers, where the truncation is performed to rule out the contributions of
samples that deviate significantly from the {\em sample median} of measurement
residuals adaptively in each iteration. We demonstrate that, when initialized
in a basin of attraction close to the ground truth, the proposed algorithm
converges to the ground truth at a linear rate for the Gaussian measurement
model with a nearoptimal number of measurements, even when a constant fraction
of the measurements are arbitrarily corrupted. In addition, we propose a new
truncated spectral method that ensures an initialization in the basin of
attraction at slightly higher requirements. We finally provide numerical
experiments to validate the superior performance of the proposed approach.

This paper investigates the phase retrieval problem, which aims to recover a
signal from the magnitudes of its linear measurements. We develop statistically
and computationally efficient algorithms for the situation when the
measurements are corrupted by sparse outliers that can take arbitrary values.
We propose a novel approach to robustify the gradient descent algorithm by
using the sample median as a guide for pruning spurious samples in
initialization and local search. Adopting the Poisson loss and the reshaped
quadratic loss respectively, we obtain two algorithms termed medianTWF and
medianRWF, both of which provably recover the signal from a nearoptimal
number of measurements when the measurement vectors are composed of i.i.d.
Gaussian entries, up to a logarithmic factor, even when a constant fraction of
the measurements are adversarially corrupted. We further show that both
algorithms are stable in the presence of additional dense bounded noise. Our
analysis is accomplished by developing nontrivial concentration results of
medianrelated quantities, which may be of independent interest. We provide
numerical experiments to demonstrate the effectiveness of our approach.

We consider simultaneously identifying the membership and locations of point
sources that are convolved with different bandlimited point spread functions,
from the observation of their superpositions. This problem arises in
threedimensional superresolution singlemolecule imaging, neural spike
sorting, multiuser channel identification, among other applications. We
propose a novel algorithm, based on convex programming, and establish its
nearoptimal performance guarantee for exact recovery in the noisefree setting
by exploiting the spectral sparsity of the point source models as well as the
incoherence between point spread functions. Furthermore, robustness of the
recovery algorithm in the presence of bounded noise is also established.
Numerical examples are provided to demonstrate the effectiveness of the
proposed approach.

Networked sensing, where the goal is to perform complex inference using a
large number of inexpensive and decentralized sensors, has become an
increasingly attractive research topic due to its applications in wireless
sensor networks and internetofthings. To reduce the communication, sensing
and storage complexity, this paper proposes a simple sensing and estimation
framework to faithfully recover the principal subspace of highdimensional data
streams using a collection of binary measurements from distributed sensors,
without transmitting the whole data. The binary measurements are designed to
indicate comparison outcomes of aggregated energy projections of the data
samples over pairs of randomly selected directions. When the covariance matrix
is a lowrank matrix, we propose a spectral estimator that recovers the
principal subspace of the covariance matrix as the subspace spanned by the top
eigenvectors of a properly designed surrogate matrix, which is provably
accurate as soon as the number of binary measurements is sufficiently large. An
adaptive rank selection strategy based on soft thresholding is also presented.
Furthermore, we propose a tailored spectral estimator when the covariance
matrix is additionally Toeplitz, and show reliable estimation can be obtained
from a substantially smaller number of binary measurements. Our results hold
even when a constant fraction of the binary measurements is randomly flipped.
Finally, we develop a lowcomplexity online algorithm to track the principal
subspace when new measurements arrive sequentially. Numerical examples are
provided to validate the proposed approach.

We study the phase retrieval problem, which solves quadratic system of
equations, i.e., recovers a vector $\boldsymbol{x}\in \mathbb{R}^n$ from its
magnitude measurements $y_i=\langle \boldsymbol{a}_i, \boldsymbol{x}\rangle,
i=1,..., m$. We develop a gradientlike algorithm (referred to as RWF
representing reshaped Wirtinger flow) by minimizing a nonconvex nonsmooth loss
function. In comparison with existing nonconvex Wirtinger flow (WF) algorithm
\cite{candes2015phase}, although the loss function becomes nonsmooth, it
involves only the second power of variable and hence reduces the complexity. We
show that for random Gaussian measurements, RWF enjoys geometric convergence to
a global optimal point as long as the number $m$ of measurements is on the
order of $n$, the dimension of the unknown $\boldsymbol{x}$. This improves the
sample complexity of WF, and achieves the same sample complexity as truncated
Wirtinger flow (TWF) \cite{chen2015solving}, but without truncation in gradient
loop. Furthermore, RWF costs less computationally than WF, and runs faster
numerically than both WF and TWF. We further develop the incremental
(stochastic) reshaped Wirtinger flow (IRWF) and show that IRWF converges
linearly to the true signal. We further establish performance guarantee of an
existing Kaczmarz method for the phase retrieval problem based on its
connection to IRWF. We also empirically demonstrate that IRWF outperforms
existing ITWF algorithm (stochastic version of TWF) as well as other batch
algorithms.

We study the problem of estimating a lowrank positive semidefinite (PSD)
matrix from a set of rankone measurements using sensing vectors composed of
i.i.d. standard Gaussian entries, which are possibly corrupted by arbitrary
outliers. This problem arises from applications such as phase retrieval,
covariance sketching, quantum space tomography, and power spectrum estimation.
We first propose a convex optimization algorithm that seeks the PSD matrix with
the minimum $\ell_1$norm of the observation residual. The advantage of our
algorithm is that it is free of parameters, therefore eliminating the need for
tuning parameters and allowing easy implementations. We establish that with
high probability, a lowrank PSD matrix can be exactly recovered as soon as the
number of measurements is large enough, even when a fraction of the
measurements are corrupted by outliers with arbitrary magnitudes. Moreover, the
recovery is also stable against bounded noise. With the additional information
of an upper bound of the rank of the PSD matrix, we propose another nonconvex
algorithm based on subgradient descent that demonstrates excellent empirical
performance in terms of computational efficiency and accuracy.

Neural recordings, returns from radars and sonars, images in astronomy and
singlemolecule microscopy can be modeled as a linear superposition of a small
number of scaled and delayed copies of a bandlimited or diffractionlimited
point spread function, which is either determined by the nature or designed by
the users; in other words, we observe the convolution between a point spread
function and a sparse spike signal with unknown amplitudes and delays. While it
is of great interest to accurately resolve the spike signal from as few samples
as possible, however, when the point spread function is not known a priori,
this problem is terribly illposed. This paper proposes a convex optimization
framework to simultaneously estimate the point spread function as well as the
spike signal, by mildly constraining the point spread function to lie in a
known lowdimensional subspace. By applying the lifting trick, we obtain an
underdetermined linear system of an ensemble of signals with joint spectral
sparsity, to which atomic norm minimization is applied. Under mild randomness
assumptions of the lowdimensional subspace as well as a separation condition
of the spike signal, we prove the proposed algorithm, dubbed as AtomicLift, is
guaranteed to recover the spike signal up to a scaling factor as soon as the
number of samples is large enough. The extension of AtomicLift to handle noisy
measurements is also discussed. Numerical examples are provided to validate the
effectiveness of the proposed approaches.

Compressed Sensing suggests that the required number of samples for
reconstructing a signal can be greatly reduced if it is sparse in a known
discrete basis, yet many realworld signals are sparse in a continuous
dictionary. One example is the spectrallysparse signal, which is composed of a
small number of spectral atoms with arbitrary frequencies on the unit interval.
In this paper we study the problem of line spectrum denoising and estimation
with an ensemble of spectrallysparse signals composed of the same set of
continuousvalued frequencies from their partial and noisy observations. Two
approaches are developed based on atomic norm minimization and structured
covariance estimation, both of which can be solved efficiently via semidefinite
programming. The first approach aims to estimate and denoise the set of signals
from their partial and noisy observations via atomic norm minimization, and
recover the frequencies via examining the dual polynomial of the convex
program. We characterize the optimality condition of the proposed algorithm and
derive the expected convergence rate for denoising, demonstrating the benefit
of including multiple measurement vectors. The second approach aims to recover
the population covariance matrix from the partially observed sample covariance
matrix by motivating its lowrank Toeplitz structure without recovering the
signal ensemble. Performance guarantee is derived with a finite number of
measurement vectors. The frequencies can be recovered via conventional spectrum
estimation methods such as MUSIC from the estimated covariance matrix. Finally,
numerical examples are provided to validate the favorable performance of the
proposed algorithms, with comparisons against several existing approaches.

We consider simultaneously identifying the membership and locations of point
sources that are convolved with different lowpass point spread functions, from
the observation of their superpositions. This problem arises in
threedimensional superresolution singlemolecule imaging, neural spike
sorting, multiuser channel identification, among others. We propose a novel
algorithm, based on convex programming, and establish its nearoptimal
performance guarantee for exact recovery by exploiting the sparsity of the
point source model as well as incoherence between the point spread functions.
Numerical examples are provided to demonstrate the effectiveness of the
proposed approach.

Statistical inference and information processing of highdimensional data
often require efficient and accurate estimation of their secondorder
statistics. With rapidly changing data, limited processing power and storage at
the acquisition devices, it is desirable to extract the covariance structure
from a single pass over the data and a small number of stored measurements. In
this paper, we explore a quadratic (or rankone) measurement model which
imposes minimal memory requirements and low computational complexity during the
sampling process, and is shown to be optimal in preserving various
lowdimensional covariance structures. Specifically, four popular structural
assumptions of covariance matrices, namely low rank, Toeplitz low rank,
sparsity, jointly rankone and sparse structure, are investigated, while
recovery is achieved via convex relaxation paradigms for the respective
structure.
The proposed quadratic sampling framework has a variety of potential
applications including streaming data processing, highfrequency wireless
communication, phase space tomography and phase retrieval in optics, and
noncoherent subspace detection. Our method admits universally accurate
covariance estimation in the absence of noise, as soon as the number of
measurements exceeds the information theoretic limits. We also demonstrate the
robustness of this approach against noise and imperfect structural assumptions.
Our analysis is established upon a novel notion called the mixednorm
restricted isometry property (RIP$\ell_{2}/\ell_{1}$), as well as the
conventional RIP$\ell_{2}/\ell_{2}$ for nearisotropic and bounded
measurements. In addition, our results improve upon the bestknown phase
retrieval (including both dense and sparse signals) guarantees using PhaseLift
with a significantly simpler approach.

The paper explores the problem of \emph{spectral compressed sensing}, which
aims to recover a spectrally sparse signal from a small random subset of its
$n$ time domain samples. The signal of interest is assumed to be a
superposition of $r$ multidimensional complex sinusoids, while the underlying
frequencies can assume any \emph{continuous} values in the normalized frequency
domain. Conventional compressed sensing paradigms suffer from the basis
mismatch issue when imposing a discrete dictionary on the Fourier
representation. To address this issue, we develop a novel algorithm, called
\emph{Enhanced Matrix Completion (EMaC)}, based on structured matrix completion
that does not require prior knowledge of the model order. The algorithm starts
by arranging the data into a lowrank enhanced form exhibiting multifold
Hankel structure, and then attempts recovery via nuclear norm minimization.
Under mild incoherence conditions, EMaC allows perfect recovery as soon as the
number of samples exceeds the order of $r\log^{4}n$, and is stable against
bounded noise. Even if a constant portion of samples are corrupted with
arbitrary magnitude, EMaC still allows exact recovery, provided that the sample
complexity exceeds the order of $r^{2}\log^{3}n$. Along the way, our results
demonstrate the power of convex relaxation in completing a lowrank multifold
Hankel or Toeplitz matrix from minimal observed entries. The performance of our
algorithm and its applicability to super resolution are further validated by
numerical experiments.

Compressed Sensing (CS) is an effective approach to reduce the required
number of samples for reconstructing a sparse signal in an a priori basis, but
may suffer severely from the issue of basis mismatch. In this paper we study
the problem of simultaneously recovering multiple spectrallysparse signals
that are supported on the same frequencies lying arbitrarily on the unit
circle. We propose an atomic norm minimization problem, which can be regarded
as a continuous counterpart of the discrete CS formulation and be solved
efficiently via semidefinite programming. Through numerical experiments, we
show that the number of samples per signal may be further reduced by harnessing
the joint sparsity pattern of multiple signals.

The paper studies the problem of recovering a spectrally sparse object from a
small number of time domain samples. Specifically, the object of interest with
ambient dimension $n$ is assumed to be a mixture of $r$ complex
multidimensional sinusoids, while the underlying frequencies can assume any
value in the unit disk. Conventional compressed sensing paradigms suffer from
the {\em basis mismatch} issue when imposing a discrete dictionary on the
Fourier representation. To address this problem, we develop a novel
nonparametric algorithm, called enhanced matrix completion (EMaC), based on
structured matrix completion. The algorithm starts by arranging the data into a
lowrank enhanced form with multifold Hankel structure, then attempts recovery
via nuclear norm minimization. Under mild incoherence conditions, EMaC allows
perfect recovery as soon as the number of samples exceeds the order of
$\mathcal{O}(r\log^{2} n)$. We also show that, in many instances, accurate
completion of a lowrank multifold Hankel matrix is possible when the number
of observed entries is proportional to the information theoretical limits
(except for a logarithmic gap). The robustness of EMaC against bounded noise
and its applicability to super resolution are further demonstrated by numerical
experiments.

MultiUser Detection is fundamental not only to cellular wireless
communication but also to RadioFrequency Identification (RFID) technology that
supports supply chain management. The challenge of Multiuser Detection (MUD)
is that of demodulating mutually interfering signals, and the two biggest
impediments are the asynchronous character of random access and the lack of
channel state information. Given that at any time instant the number of active
users is typically small, the promise of Compressive Sensing (CS) is the
demodulation of sparse superpositions of signature waveforms from very few
measurements. This paper begins by unifying two frontend architectures
proposed for MUD by showing that both lead to the same discrete signal model.
Algorithms are presented for coherent and noncoherent detection that are based
on iterative matching pursuit. Noncoherent detection is all that is needed in
the application to RFID technology where it is only the identity of the active
users that is required. The coherent detector is also able to recover the
transmitted symbols. It is shown that compressive demodulation requires
$\mathcal{O}(K\log N(\tau+1))$ samples to recover $K$ active users whereas
standard MUD requires $N(\tau+1)$ samples to process $N$ total users with a
maximal delay $\tau$. Performance guarantees are derived for both coherent and
noncoherent detection that are identical in the way they scale with number of
active users. The power profile of the active users is shown to be less
important than the SNR of the weakest user. Gabor frames and Kerdock codes are
proposed as signature waveforms and numerical examples demonstrate the superior
performance of Kerdock codes  the same probability of error with less than
half the samples.

Many real world data sets exhibit an embedding of lowdimensional structure
in a highdimensional manifold. Examples include images, videos and internet
traffic data. It is of great significance to reduce the storage requirements
and computational complexity when the data dimension is high. Therefore we
consider the problem of reconstructing a data stream from a small subset of its
entries, where the data is assumed to lie in a lowdimensional linear subspace,
possibly corrupted by noise. We further consider tracking the change of the
underlying subspace, which can be applied to applications such as video
denoising, network monitoring and anomaly detection. Our problem can be viewed
as a sequential lowrank matrix completion problem in which the subspace is
learned in an online fashion. The proposed algorithm, dubbed Parallel
Estimation and Tracking by REcursive Least Squares (PETRELS), first identifies
the underlying lowdimensional subspace via a recursive procedure for each row
of the subspace matrix in parallel with discounting for previous observations,
and then reconstructs the missing entries via leastsquares estimation if
required. Numerical examples are provided for directionofarrival estimation
and matrix completion, comparing PETRELS with state of the art batch
algorithms.

In this paper, we present coherencebased performance guarantees of
Orthogonal Matching Pursuit (OMP) for both support recovery and signal
reconstruction of sparse signals when the measurements are corrupted by noise.
In particular, two variants of OMP either with known sparsity level or with a
stopping rule are analyzed. It is shown that if the measurement matrix
$X\in\mathbb{C}^{n\times p}$ satisfies the strong coherence property, then with
$n\gtrsim\mathcal{O}(k\log p)$, OMP will recover a $k$sparse signal with high
probability. In particular, the performance guarantees obtained here separate
the properties required of the measurement matrix from the properties required
of the signal, which depends critically on the minimum signal to noise ratio
rather than the power profiles of the signal. We also provide performance
guarantees for partial support recovery. Comparisons are given with other
performance guarantees for OMP using worstcase analysis and the sorted one
step thresholding algorithm.