
This work considers two popular minimization problems: (i) the minimization
of a general convex function $f(\mathbf{X})$ with the domain being positive
semidefinite matrices; (ii) the minimization of a general convex function
$f(\mathbf{X})$ regularized by the matrix nuclear norm $\\mathbf{X}\_*$ with
the domain being general matrices. Despite their optimal statistical
performance in the literature, these two optimization problems have a high
computational complexity even when solved using tailored fast convex solvers.
To develop faster and more scalable algorithms, we follow the proposal of Burer
and Monteiro to factor the lowrank variable $\mathbf{X} =
\mathbf{U}\mathbf{U}^\top $ (for semidefinite matrices) or
$\mathbf{X}=\mathbf{U}\mathbf{V}^\top $ (for general matrices) and also replace
the nuclear norm $\\mathbf{X}\_*$ with
$(\\mathbf{U}\_F^2+\\mathbf{V}\_F^2)/2$. In spite of the nonconvexity of
the resulting factored formulations, we prove that each critical point either
corresponds to the global optimum of the original convex problems or is a
strict saddle where the Hessian matrix has a strictly negative eigenvalue. Such
a nice geometric structure of the factored formulations allows many local
search algorithms to find a global optimizer even with random initializations.

We consider designing a robust structured sparse sensing matrix consisting of
a sparse matrix with a few nonzero entries per row and a dense base matrix for
capturing signals efficiently We design the robust structured sparse sensing
matrix through minimizing the distance between the Gram matrix of the
equivalent dictionary and the target Gram of matrix holding small mutual
coherence. Moreover, a regularization is added to enforce the robustness of the
optimized structured sparse sensing matrix to the sparse representation error
(SRE) of signals of interests. An alternating minimization algorithm with
global sequence convergence is proposed for solving the corresponding
optimization problem. Numerical experiments on synthetic data and natural
images show that the obtained structured sensing matrix results in a higher
signal reconstruction than a random dense sensing matrix.

This paper considers the problem of simultaneously learning the Sensing
Matrix and Sparsifying Dictionary (SMSD) on a large training dataset. To
address the formulated joint learning problem, we propose an online algorithm
that consists of a closedform solution for optimizing the sensing matrix with
a fixed sparsifying dictionary and a stochastic method for learning the
sparsifying dictionary on a large dataset when the sensing matrix is given.
Benefiting from training on a large dataset, the obtained compressive sensing
(CS) system by the proposed algorithm yields a much better performance in terms
of signal recovery accuracy than the existing ones. The simulation results on
natural images demonstrate the effectiveness of the suggested online algorithm
compared with the existing methods.

Based on the maximum likelihood estimation principle, we derive a
collaborative estimation framework that fuses several different estimators and
yields a better estimate. Applying it to compressive sensing (CS), we propose a
collaborative CS (CCS) scheme consisting of a bank of $K$ CS systems that share
the same sensing matrix but have different sparsifying dictionaries. This CCS
system is expected to yield better performance than each individual CS system,
while requiring the same time as that needed for each individual CS system when
a parallel computing strategy is used. We then provide an approach to designing
optimal CCS systems by utilizing a measure that involves both the sensing
matrix and dictionaries and hence allows us to simultaneously optimize the
sensing matrix and all the $K$ dictionaries. An alternating minimizationbased
algorithm is derived for solving the corresponding optimal design problem. With
a rigorous convergence analysis, we show that the proposed algorithm is
convergent. Experiments are carried out to confirm the theoretical results and
show that the proposed CCS system yields significant improvements over the
existing CS systems in terms of the signal recovery accuracy.

This paper considers the minimization of a general objective function $f(X)$
over the set of rectangular $n\times m$ matrices that have rank at most $r$. To
reduce the computational burden, we factorize the variable $X$ into a product
of two smaller matrices and optimize over these two matrices instead of $X$.
Despite the resulting nonconvexity, recent studies in matrix completion and
sensing have shown that the factored problem has no spurious local minima and
obeys the socalled strict saddle property (the function has a directional
negative curvature at all critical points but local minima). We analyze the
global geometry for a general and yet wellconditioned objective function
$f(X)$ whose restricted strong convexity and restricted strong smoothness
constants are comparable. In particular, we show that the reformulated
objective function has no spurious local minima and obeys the strict saddle
property. These geometric properties imply that a number of iterative
optimization algorithms (such as gradient descent) can provably solve the
factored problem with global convergence.

We consider the convergence properties for alternating projection algorithm
(a.k.a alternating projections) which has been widely utilized to solve many
practical problems in machine learning, signal and image processing,
communication and statistics. In this paper, we formalize two properties of
proper, lower semicontinuous and semialgebraic sets: the three point property
for all possible iterates and the local contraction prop erty that serves as
the nonexpensiveness property of the projector, but only for the iterates that
are closed enough to each other. Then by exploiting the geometric properties of
the objective function around its critical point,i.e. the
KurdykaLojasiewicz(KL)property, we establish a new convergence analysis
framework to show that if one set satisfies the three point property and the
other one obeys the local contraction property, the iterates generated by
alternating projections is a convergent sequence and converges to a critical
point. We complete this study by providing convergence rate which depends on
the explicit expression of the KL exponent. As a byproduct, we use our new
analysis framework to recover the linear convergence rate of alternating
projections onto closed convex sets. To illustrate the power of our new
framework, we provide new convergence result for a class of concrete
applications: alternating projections for designing structured tight frames
that are widely used in sparse representation, compressed sensing and
communication. We believe that our new analysis framework can be applied to
guarantee the convergence of alternating projections when utilized for many
other nonconvex and nonsmooth sets.

This paper considers general rankconstrained optimization problems that
minimize a general objective function $f(X)$ over the set of rectangular
$n\times m$ matrices that have rank at most $r$. To tackle the rank constraint
and also to reduce the computational burden, we factorize $X$ into $UV^T$ where
$U$ and $V$ are $n\times r$ and $m\times r$ matrices, respectively, and then
optimize over the small matrices $U$ and $V$. We characterize the global
optimization geometry of the nonconvex factored problem and show that the
corresponding objective function satisfies the robust strict saddle property as
long as the original objective function $f$ satisfies restricted strong
convexity and smoothness properties, ensuring global convergence of many local
search algorithms (such as noisy gradient descent) in polynomial time for
solving the factored problem. We also provide a comprehensive analysis for the
optimization geometry of a matrix factorization problem where we aim to find
$n\times r$ and $m\times r$ matrices $U$ and $V$ such that $UV^T$ approximates
a given matrix $X^\star$. Aside from the robust strict saddle property, we show
that the objective function of the matrix factorization problem has no spurious
local minima and obeys the strict saddle property not only for the
exactparameterization case where $rank(X^\star) = r$, but also for the
overparameterization case where $rank(X^\star) < r$ and the
underparameterization case where $rank(X^\star) > r$. These geometric
properties imply that a number of iterative optimization algorithms (such as
gradient descent) converge to a global solution with random initialization.

Our objective is to efficiently design a robust projection matrix $\Phi$ for
the Compressive Sensing (CS) systems when applied to the signals that are not
exactly sparse. The optimal projection matrix is obtained by mainly minimizing
the average coherence of the equivalent dictionary. In order to drop the
requirement of the sparse representation error (SRE) for a set of training data
as in [15] [16], we introduce a novel penalty function independent of a
particular SRE matrix. Without requiring of training data, we can efficiently
design the robust projection matrix and apply it for most of CS systems, like a
CS system for image processing with a conventional wavelet dictionary in which
the SRE matrix is generally not available. Simulation results demonstrate the
efficiency and effectiveness of the proposed approach compared with the
stateoftheart methods. In addition, we experimentally demonstrate with
natural images that under similar compression rate, a CS system with a learned
dictionary in high dimensions outperforms the one in low dimensions in terms of
reconstruction accuracy. This together with the fact that our proposed method
can efficiently work in high dimension suggests that a CS system can be
potentially implemented beyond the small patches in sparsitybased image
processing.

The discrete prolate spheroidal sequences (DPSS's) provide an efficient
representation for discrete signals that are perfectly timelimited and nearly
bandlimited. Due to the high computational complexity of projecting onto the
DPSS basis  also known as the Slepian basis  this representation is often
overlooked in favor of the fast Fourier transform (FFT). We show that there
exist fast constructions for computing approximate projections onto the leading
Slepian basis elements. The complexity of the resulting algorithms is
comparable to the FFT, and scales favorably as the quality of the desired
approximation is increased. In the process of bounding the complexity of these
algorithms, we also establish new nonasymptotic results on the eigenvalue
distribution of discrete timefrequency localization operators. We then
demonstrate how these algorithms allow us to efficiently compute the solution
to certain leastsquares problems that arise in signal processing. We also
provide simulations comparing these fast, approximate Slepian methods to exact
Slepian methods as well as the traditional FFT based methods.

Bandlimiting and timelimiting operators play a fundamental role in analyzing
bandlimited signals that are approximately timelimited (or vice versa). In this
paper, we consider a timefrequency (in the discrete Fourier transform (DFT)
domain) limiting operator whose eigenvectors are known as the periodic discrete
prolate spheroidal sequences (PDPSSs). We establish new nonasymptotic results
on the eigenvalue distribution of this operator. As a byproduct, we also
characterize the eigenvalue distribution of a set of submatrices of the DFT
matrix, which is of independent interest.

This work investigates the geometry of a nonconvex reformulation of
minimizing a general convex loss function $f(X)$ regularized by the matrix
nuclear norm $\X\_*$. Nuclearnorm regularized matrix inverse problems are at
the heart of many applications in machine learning, signal processing, and
control. The statistical performance of nuclear norm regularization has been
studied extensively in literature using convex analysis techniques. Despite its
optimal performance, the resulting optimization has high computational
complexity when solved using standard or even tailored fast convex solvers. To
develop faster and more scalable algorithms, we follow the proposal of
BurerMonteiro to factor the matrix variable $X$ into the product of two
smaller rectangular matrices $X=UV^T$ and also replace the nuclear norm
$\X\_*$ with $(\U\_F^2+\V\_F^2)/2$. In spite of the nonconvexity of the
factored formulation, we prove that when the convex loss function $f(X)$ is
$(2r,4r)$restricted wellconditioned, each critical point of the factored
problem either corresponds to the optimal solution $X^\star$ of the original
convex optimization or is a strict saddle point where the Hessian matrix has a
strictly negative eigenvalue. Such a geometric structure of the factored
formulation allows many local search algorithms to converge to the global
optimum with random initializations.

Any sequence of uniformly bounded $N\times N$ Hermitian Toeplitz matrices
$\{\boldsymbol{H}_N\}$ is asymptotically equivalent to a certain sequence of
$N\times N$ circulant matrices $\{\boldsymbol{C}_N\}$ derived from the Toeplitz
matrices in the sense that $\left\ \boldsymbol{H}_N  \boldsymbol{C}_N
\right\_F = o(\sqrt{N})$ as $N\rightarrow \infty$. This implies that certain
collective behaviors of the eigenvalues of each Toeplitz matrix are reflected
in those of the corresponding circulant matrix and supports the utilization of
the computationally efficient fast Fourier transform (instead of the
KarhunenLo\`{e}ve transform) in applications like coding and filtering. In
this paper, we study the asymptotic performance of the individual eigenvalue
estimates. We show that the asymptotic equivalence of the circulant and
Toeplitz matrices implies the individual asymptotic convergence of the
eigenvalues for certain types of Toeplitz matrices. We also show that these
estimates asymptotically approximate the largest and smallest eigenvalues for
more general classes of Toeplitz matrices.

Many signal processing problemssuch as analysis, compression, denoising,
and reconstructioncan be facilitated by expressing the signal as a linear
combination of atoms from a wellchosen dictionary. In this paper, we study
possible dictionaries for representing the discrete vector one obtains when
collecting a finite set of uniform samples from a multiband analog signal. By
analyzing the spectrum of combined time and multibandlimiting operations in
the discretetime domain, we conclude that the information level of the sampled
multiband vectors is essentially equal to the timefrequency area. For
representing these vectors, we consider a dictionary formed by concatenating a
collection of modulated Discrete Prolate Spheroidal Sequences (DPSS's). We
study the angle between the subspaces spanned by this dictionary and an optimal
dictionary, and we conclude that the multiband modulated DPSS dictionarywhich
is simple to construct and more flexible than the optimal dictionary in
practical applicationsis nearly optimal for representing multiband sample
vectors. We also show that the multiband modulated DPSS dictionary not only
provides a very high degree of approximation accuracy in an MSE sense for
multiband sample vectors (using a number of atoms comparable to the information
level), but also that it can provide highquality approximations of all sampled
sinusoids within the bands of interest.