• This work considers two popular minimization problems: (i) the minimization of a general convex function $f(\mathbf{X})$ with the domain being positive semi-definite matrices; (ii) the minimization of a general convex function $f(\mathbf{X})$ regularized by the matrix nuclear norm $\|\mathbf{X}\|_*$ with the domain being general matrices. Despite their optimal statistical performance in the literature, these two optimization problems have a high computational complexity even when solved using tailored fast convex solvers. To develop faster and more scalable algorithms, we follow the proposal of Burer and Monteiro to factor the low-rank variable $\mathbf{X} = \mathbf{U}\mathbf{U}^\top $ (for semi-definite matrices) or $\mathbf{X}=\mathbf{U}\mathbf{V}^\top $ (for general matrices) and also replace the nuclear norm $\|\mathbf{X}\|_*$ with $(\|\mathbf{U}\|_F^2+\|\mathbf{V}\|_F^2)/2$. In spite of the non-convexity of the resulting factored formulations, we prove that each critical point either corresponds to the global optimum of the original convex problems or is a strict saddle where the Hessian matrix has a strictly negative eigenvalue. Such a nice geometric structure of the factored formulations allows many local search algorithms to find a global optimizer even with random initializations.
  • We consider designing a robust structured sparse sensing matrix consisting of a sparse matrix with a few non-zero entries per row and a dense base matrix for capturing signals efficiently We design the robust structured sparse sensing matrix through minimizing the distance between the Gram matrix of the equivalent dictionary and the target Gram of matrix holding small mutual coherence. Moreover, a regularization is added to enforce the robustness of the optimized structured sparse sensing matrix to the sparse representation error (SRE) of signals of interests. An alternating minimization algorithm with global sequence convergence is proposed for solving the corresponding optimization problem. Numerical experiments on synthetic data and natural images show that the obtained structured sensing matrix results in a higher signal reconstruction than a random dense sensing matrix.
  • This paper considers the problem of simultaneously learning the Sensing Matrix and Sparsifying Dictionary (SMSD) on a large training dataset. To address the formulated joint learning problem, we propose an online algorithm that consists of a closed-form solution for optimizing the sensing matrix with a fixed sparsifying dictionary and a stochastic method for learning the sparsifying dictionary on a large dataset when the sensing matrix is given. Benefiting from training on a large dataset, the obtained compressive sensing (CS) system by the proposed algorithm yields a much better performance in terms of signal recovery accuracy than the existing ones. The simulation results on natural images demonstrate the effectiveness of the suggested online algorithm compared with the existing methods.
  • Based on the maximum likelihood estimation principle, we derive a collaborative estimation framework that fuses several different estimators and yields a better estimate. Applying it to compressive sensing (CS), we propose a collaborative CS (CCS) scheme consisting of a bank of $K$ CS systems that share the same sensing matrix but have different sparsifying dictionaries. This CCS system is expected to yield better performance than each individual CS system, while requiring the same time as that needed for each individual CS system when a parallel computing strategy is used. We then provide an approach to designing optimal CCS systems by utilizing a measure that involves both the sensing matrix and dictionaries and hence allows us to simultaneously optimize the sensing matrix and all the $K$ dictionaries. An alternating minimization-based algorithm is derived for solving the corresponding optimal design problem. With a rigorous convergence analysis, we show that the proposed algorithm is convergent. Experiments are carried out to confirm the theoretical results and show that the proposed CCS system yields significant improvements over the existing CS systems in terms of the signal recovery accuracy.
  • This paper considers the minimization of a general objective function $f(X)$ over the set of rectangular $n\times m$ matrices that have rank at most $r$. To reduce the computational burden, we factorize the variable $X$ into a product of two smaller matrices and optimize over these two matrices instead of $X$. Despite the resulting nonconvexity, recent studies in matrix completion and sensing have shown that the factored problem has no spurious local minima and obeys the so-called strict saddle property (the function has a directional negative curvature at all critical points but local minima). We analyze the global geometry for a general and yet well-conditioned objective function $f(X)$ whose restricted strong convexity and restricted strong smoothness constants are comparable. In particular, we show that the reformulated objective function has no spurious local minima and obeys the strict saddle property. These geometric properties imply that a number of iterative optimization algorithms (such as gradient descent) can provably solve the factored problem with global convergence.
  • We consider the convergence properties for alternating projection algorithm (a.k.a alternating projections) which has been widely utilized to solve many practical problems in machine learning, signal and image processing, communication and statistics. In this paper, we formalize two properties of proper, lower semi-continuous and semi-algebraic sets: the three point property for all possible iterates and the local contraction prop- erty that serves as the non-expensiveness property of the projector, but only for the iterates that are closed enough to each other. Then by exploiting the geometric properties of the objective function around its critical point,i.e. the Kurdyka-Lojasiewicz(KL)property, we establish a new convergence analysis framework to show that if one set satisfies the three point property and the other one obeys the local contraction property, the iterates generated by alternating projections is a convergent sequence and converges to a critical point. We complete this study by providing convergence rate which depends on the explicit expression of the KL exponent. As a byproduct, we use our new analysis framework to recover the linear convergence rate of alternating projections onto closed convex sets. To illustrate the power of our new framework, we provide new convergence result for a class of concrete applications: alternating projections for designing structured tight frames that are widely used in sparse representation, compressed sensing and communication. We believe that our new analysis framework can be applied to guarantee the convergence of alternating projections when utilized for many other nonconvex and nonsmooth sets.
  • This paper considers general rank-constrained optimization problems that minimize a general objective function $f(X)$ over the set of rectangular $n\times m$ matrices that have rank at most $r$. To tackle the rank constraint and also to reduce the computational burden, we factorize $X$ into $UV^T$ where $U$ and $V$ are $n\times r$ and $m\times r$ matrices, respectively, and then optimize over the small matrices $U$ and $V$. We characterize the global optimization geometry of the nonconvex factored problem and show that the corresponding objective function satisfies the robust strict saddle property as long as the original objective function $f$ satisfies restricted strong convexity and smoothness properties, ensuring global convergence of many local search algorithms (such as noisy gradient descent) in polynomial time for solving the factored problem. We also provide a comprehensive analysis for the optimization geometry of a matrix factorization problem where we aim to find $n\times r$ and $m\times r$ matrices $U$ and $V$ such that $UV^T$ approximates a given matrix $X^\star$. Aside from the robust strict saddle property, we show that the objective function of the matrix factorization problem has no spurious local minima and obeys the strict saddle property not only for the exact-parameterization case where $rank(X^\star) = r$, but also for the over-parameterization case where $rank(X^\star) < r$ and the under-parameterization case where $rank(X^\star) > r$. These geometric properties imply that a number of iterative optimization algorithms (such as gradient descent) converge to a global solution with random initialization.
  • Our objective is to efficiently design a robust projection matrix $\Phi$ for the Compressive Sensing (CS) systems when applied to the signals that are not exactly sparse. The optimal projection matrix is obtained by mainly minimizing the average coherence of the equivalent dictionary. In order to drop the requirement of the sparse representation error (SRE) for a set of training data as in [15] [16], we introduce a novel penalty function independent of a particular SRE matrix. Without requiring of training data, we can efficiently design the robust projection matrix and apply it for most of CS systems, like a CS system for image processing with a conventional wavelet dictionary in which the SRE matrix is generally not available. Simulation results demonstrate the efficiency and effectiveness of the proposed approach compared with the state-of-the-art methods. In addition, we experimentally demonstrate with natural images that under similar compression rate, a CS system with a learned dictionary in high dimensions outperforms the one in low dimensions in terms of reconstruction accuracy. This together with the fact that our proposed method can efficiently work in high dimension suggests that a CS system can be potentially implemented beyond the small patches in sparsity-based image processing.
  • The discrete prolate spheroidal sequences (DPSS's) provide an efficient representation for discrete signals that are perfectly timelimited and nearly bandlimited. Due to the high computational complexity of projecting onto the DPSS basis - also known as the Slepian basis - this representation is often overlooked in favor of the fast Fourier transform (FFT). We show that there exist fast constructions for computing approximate projections onto the leading Slepian basis elements. The complexity of the resulting algorithms is comparable to the FFT, and scales favorably as the quality of the desired approximation is increased. In the process of bounding the complexity of these algorithms, we also establish new nonasymptotic results on the eigenvalue distribution of discrete time-frequency localization operators. We then demonstrate how these algorithms allow us to efficiently compute the solution to certain least-squares problems that arise in signal processing. We also provide simulations comparing these fast, approximate Slepian methods to exact Slepian methods as well as the traditional FFT based methods.
  • Bandlimiting and timelimiting operators play a fundamental role in analyzing bandlimited signals that are approximately timelimited (or vice versa). In this paper, we consider a time-frequency (in the discrete Fourier transform (DFT) domain) limiting operator whose eigenvectors are known as the periodic discrete prolate spheroidal sequences (PDPSSs). We establish new nonasymptotic results on the eigenvalue distribution of this operator. As a byproduct, we also characterize the eigenvalue distribution of a set of submatrices of the DFT matrix, which is of independent interest.
  • This work investigates the geometry of a nonconvex reformulation of minimizing a general convex loss function $f(X)$ regularized by the matrix nuclear norm $\|X\|_*$. Nuclear-norm regularized matrix inverse problems are at the heart of many applications in machine learning, signal processing, and control. The statistical performance of nuclear norm regularization has been studied extensively in literature using convex analysis techniques. Despite its optimal performance, the resulting optimization has high computational complexity when solved using standard or even tailored fast convex solvers. To develop faster and more scalable algorithms, we follow the proposal of Burer-Monteiro to factor the matrix variable $X$ into the product of two smaller rectangular matrices $X=UV^T$ and also replace the nuclear norm $\|X\|_*$ with $(\|U\|_F^2+\|V\|_F^2)/2$. In spite of the nonconvexity of the factored formulation, we prove that when the convex loss function $f(X)$ is $(2r,4r)$-restricted well-conditioned, each critical point of the factored problem either corresponds to the optimal solution $X^\star$ of the original convex optimization or is a strict saddle point where the Hessian matrix has a strictly negative eigenvalue. Such a geometric structure of the factored formulation allows many local search algorithms to converge to the global optimum with random initializations.
  • Any sequence of uniformly bounded $N\times N$ Hermitian Toeplitz matrices $\{\boldsymbol{H}_N\}$ is asymptotically equivalent to a certain sequence of $N\times N$ circulant matrices $\{\boldsymbol{C}_N\}$ derived from the Toeplitz matrices in the sense that $\left\| \boldsymbol{H}_N - \boldsymbol{C}_N \right\|_F = o(\sqrt{N})$ as $N\rightarrow \infty$. This implies that certain collective behaviors of the eigenvalues of each Toeplitz matrix are reflected in those of the corresponding circulant matrix and supports the utilization of the computationally efficient fast Fourier transform (instead of the Karhunen-Lo\`{e}ve transform) in applications like coding and filtering. In this paper, we study the asymptotic performance of the individual eigenvalue estimates. We show that the asymptotic equivalence of the circulant and Toeplitz matrices implies the individual asymptotic convergence of the eigenvalues for certain types of Toeplitz matrices. We also show that these estimates asymptotically approximate the largest and smallest eigenvalues for more general classes of Toeplitz matrices.
  • Many signal processing problems--such as analysis, compression, denoising, and reconstruction--can be facilitated by expressing the signal as a linear combination of atoms from a well-chosen dictionary. In this paper, we study possible dictionaries for representing the discrete vector one obtains when collecting a finite set of uniform samples from a multiband analog signal. By analyzing the spectrum of combined time- and multiband-limiting operations in the discrete-time domain, we conclude that the information level of the sampled multiband vectors is essentially equal to the time-frequency area. For representing these vectors, we consider a dictionary formed by concatenating a collection of modulated Discrete Prolate Spheroidal Sequences (DPSS's). We study the angle between the subspaces spanned by this dictionary and an optimal dictionary, and we conclude that the multiband modulated DPSS dictionary--which is simple to construct and more flexible than the optimal dictionary in practical applications--is nearly optimal for representing multiband sample vectors. We also show that the multiband modulated DPSS dictionary not only provides a very high degree of approximation accuracy in an MSE sense for multiband sample vectors (using a number of atoms comparable to the information level), but also that it can provide high-quality approximations of all sampled sinusoids within the bands of interest.