• ### Improved Support Recovery Guarantees for the Group Lasso With Applications to Structural Health Monitoring(1708.08826)

May 19, 2018 cs.IT, math.IT, stat.ML
This paper considers the problem of estimating an unknown high dimensional signal from noisy linear measurements, {when} the signal is assumed to possess a \emph{group-sparse} structure in a {known,} fixed dictionary. We consider signals generated according to a natural probabilistic model, and establish new conditions under which the set of indices of the non-zero groups of the signal (called the group-level support) may be accurately estimated via the group Lasso. Our results strengthen existing coherence-based analyses that exhibit the well-known "square root" bottleneck, allowing for the number of recoverable nonzero groups to be nearly as large as the total number of groups. We also establish a sufficient recovery condition relating the number of nonzero groups and the signal to noise ratio (quantified in terms of the ratio of the squared Euclidean norms of nonzero groups and the variance of the random additive {measurement} noise), and validate this trend empirically. Finally, we examine the implications of our results in the context of a structural health monitoring application, where the group Lasso approach facilitates demixing of a propagating acoustic wavefield, acquired on the material surface by a scanning laser Doppler vibrometer, into antithetical components, one of which indicates the locations of internal material defects.
• ### On Quadratic Convergence of DC Proximal Newton Algorithm for Nonconvex Sparse Learning in High Dimensions(1706.06066)

Feb. 15, 2018 math.OC, cs.LG, stat.ML
We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions. Our proposed algorithm integrates the proximal Newton algorithm with multi-stage convex relaxation based on the difference of convex (DC) programming, and enjoys both strong computational and statistical guarantees. Specifically, by leveraging a sophisticated characterization of sparse modeling structures/assumptions (i.e., local restricted strong convexity and Hessian smoothness), we prove that within each stage of convex relaxation, our proposed algorithm achieves (local) quadratic convergence, and eventually obtains a sparse approximate local optimum with optimal statistical properties after only a few convex relaxations. Numerical experiments are provided to support our theory.
• ### On Fast Convergence of Proximal Algorithms for SQRT-Lasso Optimization: Don't Worry About Its Nonsmooth Loss Function(1605.07950)

April 14, 2019 math.OC, cs.LG, stat.ML
Many machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility. However, by exploring the modeling structures, we find these "sacrifices" do not always require more computational efforts. To shed light on such a "free-lunch" phenomenon, we study the square-root-Lasso (SQRT-Lasso) type regression problem. Specifically, we show that the nonsmooth loss functions of SQRT-Lasso type regression ease tuning effort and gain adaptivity to inhomogeneous noise, but is not necessarily more challenging than Lasso in computation. We can directly apply proximal algorithms (e.g. proximal gradient descent, proximal Newton, and proximal Quasi-Newton algorithms) without worrying the nonsmoothness of the loss function. Theoretically, we prove that the proximal algorithms combined with the pathwise optimization scheme enjoy fast convergence guarantees with high probability. Numerical results are provided to support our theory.
• ### Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization(1612.09296)

Jan. 20, 2018 math.OC, cs.LG, stat.ML
We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}. In specific, we characterize the locations of stationary points and the null space of Hessian matrices \xl{of the objective function} via the lens of invariant groups\removed{for associated optimization problems, including low-rank matrix factorization, phase retrieval, and deep linear neural networks}. As a major motivating example, we apply the proposed general theory to characterize the global \xl{landscape} of the \xl{nonconvex optimization in} low-rank matrix factorization problem. In particular, we illustrate how the rotational symmetry group gives rise to infinitely many nonisolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: ($\cR_1$) the region containing the neighborhoods of all strict saddle points, where the objective has negative curvatures; ($\cR_2$) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and ($\cR_3$) the complement of the above regions, where the gradient has sufficiently large magnitudes. We further extend our result to the matrix sensing problem. Such global landscape implies strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions.
• ### Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction(1605.02711)

Dec. 23, 2017 math.OC, cs.LG, stat.ML
We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints. Sufficient conditions are provided, under which the proposed algorithm enjoys strong linear convergence guarantees and optimal estimation accuracy in high dimensions. We further extend the proposed algorithm to an asynchronous parallel variant with a near linear speedup. Numerical experiments demonstrate the efficiency of our algorithm in terms of both parameter estimation and computational performance.
• ### Near Optimal Sketching of Low-Rank Tensor Regression(1709.07093)

Sept. 20, 2017 cs.DS, cs.LG, stat.ML
We study the least squares regression problem \begin{align*} \min_{\Theta \in \mathcal{S}_{\odot D,R}} \|A\Theta-b\|_2, \end{align*} where $\mathcal{S}_{\odot D,R}$ is the set of $\Theta$ for which $\Theta = \sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots \circ \theta_D^{(r)}$ for vectors $\theta_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$, and $\circ$ denotes the outer product of vectors. That is, $\Theta$ is a low-dimensional, low-rank tensor. This is motivated by the fact that the number of parameters in $\Theta$ is only $R \cdot \sum_{d=1}^D p_d$, which is significantly smaller than the $\prod_{d=1}^{D} p_d$ number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors $\Theta$, as well as the Tucker decomposition. For both models we show how to apply data dimensionality reduction techniques based on {\it sparse} random projections $\Phi \in \mathbb{R}^{m \times n}$, with $m \ll n$, to reduce the problem to a much smaller problem $\min_{\Theta} \|\Phi A \Theta - \Phi b\|_2$, for which if $\Theta'$ is a near-optimum to the smaller problem, then it is also a near optimum to the original problem. We obtain significantly smaller dimension and sparsity in $\Phi$ than is possible for ordinary least squares regression, and we also provide a number of numerical simulations supporting our theory.
• ### Communication-efficient Algorithm for Distributed Sparse Learning via Two-way Truncation(1709.00537)

Sept. 9, 2017 math.OC, cs.LG, stat.ML
We propose a communicationally and computationally efficient algorithm for high-dimensional distributed sparse learning. At each iteration, local machines compute the gradient on local data and the master machine solves one shifted $l_1$ regularized minimization problem. The communication cost is reduced from constant times of the dimension number for the state-of-the-art algorithm to constant times of the sparsity number via Two-way Truncation procedure. Theoretically, we prove that the estimation error of the proposed algorithm decreases exponentially and matches that of the centralized method under mild assumptions. Extensive experiments on both simulated data and real data verify that the proposed algorithm is efficient and has performance comparable with the centralized method on solving high-dimensional sparse learning problems.
• ### Noisy Tensor Completion for Tensors with a Sparse Canonical Polyadic Factor(1704.02534)

April 8, 2017 stat.ML
In this paper we study the problem of noisy tensor completion for tensors that admit a canonical polyadic or CANDECOMP/PARAFAC (CP) decomposition with one of the factors being sparse. We present general theoretical error bounds for an estimate obtained by using a complexity-regularized maximum likelihood principle and then instantiate these bounds for the case of additive white Gaussian noise. We also provide an ADMM-type algorithm for solving the complexity-regularized maximum likelihood problem and validate the theoretical finding via experiments on synthetic data set.
• ### A Compressed Sensing Based Decomposition of Electrodermal Activity Signals(1602.07754)

Jan. 26, 2017 stat.AP, cs.LG, stat.ML
The measurement and analysis of Electrodermal Activity (EDA) offers applications in diverse areas ranging from market research, to seizure detection, to human stress analysis. Unfortunately, the analysis of EDA signals is made difficult by the superposition of numerous components which can obscure the signal information related to a user's response to a stimulus. We show how simple pre-processing followed by a novel compressed sensing based decomposition can mitigate the effects of the undesired noise components and help reveal the underlying physiological signal. The proposed framework allows for decomposition of EDA signals with provable bounds on the recovery of user responses. We test our procedure on both synthetic and real-world EDA signals from wearable sensors and demonstrate that our approach allows for more accurate recovery of user responses as compared to the existing techniques.
• ### Robust Low-Complexity Randomized Methods for Locating Outliers in Large Matrices(1612.02334)

Dec. 7, 2016 cs.IT, math.IT, cs.LG, stat.ML
This paper examines the problem of locating outlier columns in a large, otherwise low-rank matrix, in settings where {}{the data} are noisy, or where the overall matrix has missing elements. We propose a randomized two-step inference framework, and establish sufficient conditions on the required sample complexities under which these methods succeed (with high probability) in accurately locating the outliers for each task. Comprehensive numerical experimental results are provided to verify the theoretical bounds and demonstrate the computational efficiency of the proposed algorithm.
• ### Testing a Novel Self-Assembling Data Paradigm in the Context of IACT Data(1509.02202)

The process of gathering and associating data from multiple sensors or sub-detectors due to a common physical event (the process of event-building) is used in many fields, including high-energy physics and $\gamma$-ray astronomy. Fault tolerance in event-building is a challenging problem that increases in difficulty with higher data throughput rates and increasing numbers of sub-detectors. We draw on biological self-assembly models in the development of a novel event-building paradigm that treats each packet of data from an individual sensor or sub-detector as if it were a molecule in solution. Just as molecules are capable of forming chemical bonds, "bonds" can be defined between data packets using metadata-based discriminants. A database -- which plays the role of a beaker of solution -- continually selects pairs of assemblies at random to test for bonds, which allows single packets and small assemblies to aggregate into larger assemblies. During this process higher-quality associations supersede spurious ones. The database thereby becomes fluid, dynamic, and self-correcting rather than static. We will describe tests of the self-assembly paradigm using our first fluid database prototype and data from the VERITAS $\gamma$-ray telescope.
• ### On Convolutional Approximations to Linear Dimensionality Reduction Operators for Large Scale Data Processing(1502.07017)

Feb. 25, 2015 stat.ML
In this paper, we examine the problem of approximating a general linear dimensionality reduction (LDR) operator, represented as a matrix $A \in \mathbb{R}^{m \times n}$ with $m < n$, by a partial circulant matrix with rows related by circular shifts. Partial circulant matrices admit fast implementations via Fourier transform methods and subsampling operations; our investigation here is motivated by a desire to leverage these potential computational improvements in large-scale data processing tasks. We establish a fundamental result, that most large LDR matrices (whose row spaces are uniformly distributed) in fact cannot be well approximated by partial circulant matrices. Then, we propose a natural generalization of the partial circulant approximation framework that entails approximating the range space of a given LDR operator $A$ over a restricted domain of inputs, using a matrix formed as a product of a partial circulant matrix having $m '> m$ rows and a $m \times k$ 'post processing' matrix. We introduce a novel algorithmic technique, based on sparse matrix factorization, for identifying the factors comprising such approximations, and provide preliminary evidence to demonstrate the potential of this approach.
• ### Identifying Outliers in Large Matrices via Randomized Adaptive Compressive Sampling(1407.0312)

Nov. 19, 2014 cs.IT, math.IT, cs.LG, stat.ML
This paper examines the problem of locating outlier columns in a large, otherwise low-rank, matrix. We propose a simple two-step adaptive sensing and inference approach and establish theoretical guarantees for its performance; our results show that accurate outlier identification is achievable using very few linear summaries of the original data matrix -- as few as the squared rank of the low-rank component plus the number of outliers, times constant and logarithmic factors. We demonstrate the performance of our approach experimentally in two stylized applications, one motivated by robust collaborative filtering tasks, and the other by saliency map estimation tasks arising in computer vision and automated surveillance, and also investigate extensions to settings where the data are noisy, or possibly incomplete.
• ### Noisy Matrix Completion under Sparse Factor Models(1411.0282)

Nov. 2, 2014 cs.IT, math.IT, stat.AP, stat.ML
This paper examines a general class of noisy matrix completion tasks where the goal is to estimate a matrix from observations obtained at a subset of its entries, each of which is subject to random noise or corruption. Our specific focus is on settings where the matrix to be estimated is well-approximated by a product of two (a priori unknown) matrices, one of which is sparse. Such structural models - referred to here as "sparse factor models" - have been widely used, for example, in subspace clustering applications, as well as in contemporary sparse modeling and dictionary learning tasks. Our main theoretical contributions are estimation error bounds for sparsity-regularized maximum likelihood estimators for problems of this form, which are applicable to a number of different observation noise or corruption models. Several specific implications are examined, including scenarios where observations are corrupted by additive Gaussian noise or additive heavier-tailed (Laplace) noise, Poisson-distributed observations, and highly-quantized (e.g., one-bit) observations. We also propose a simple algorithmic approach based on the alternating direction method of multipliers for these tasks, and provide experimental evidence to support our error analyses.
• ### Compressive Measurement Designs for Estimating Structured Signals in Structured Clutter: A Bayesian Experimental Design Approach(1311.5599)

Nov. 21, 2013 cs.LG, stat.ML
This work considers an estimation task in compressive sensing, where the goal is to estimate an unknown signal from compressive measurements that are corrupted by additive pre-measurement noise (interference, or clutter) as well as post-measurement noise, in the specific setting where some (perhaps limited) prior knowledge on the signal, interference, and noise is available. The specific aim here is to devise a strategy for incorporating this prior information into the design of an appropriate compressive measurement strategy. Here, the prior information is interpreted as statistics of a prior distribution on the relevant quantities, and an approach based on Bayesian Experimental Design is proposed. Experimental results on synthetic data demonstrate that the proposed approach outperforms traditional random compressive measurement designs, which are agnostic to the prior information, as well as several other knowledge-enhanced sensing matrix designs based on more heuristic notions.
• ### On the Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measurements(1306.4391)

Oct. 16, 2013 cs.IT, math.IT, math.ST, stat.TH, stat.ML