
This paper considers the problem of estimating an unknown high dimensional
signal from noisy linear measurements, {when} the signal is assumed to possess
a \emph{groupsparse} structure in a {known,} fixed dictionary. We consider
signals generated according to a natural probabilistic model, and establish new
conditions under which the set of indices of the nonzero groups of the signal
(called the grouplevel support) may be accurately estimated via the group
Lasso. Our results strengthen existing coherencebased analyses that exhibit
the wellknown "square root" bottleneck, allowing for the number of recoverable
nonzero groups to be nearly as large as the total number of groups. We also
establish a sufficient recovery condition relating the number of nonzero groups
and the signal to noise ratio (quantified in terms of the ratio of the squared
Euclidean norms of nonzero groups and the variance of the random additive
{measurement} noise), and validate this trend empirically. Finally, we examine
the implications of our results in the context of a structural health
monitoring application, where the group Lasso approach facilitates demixing of
a propagating acoustic wavefield, acquired on the material surface by a
scanning laser Doppler vibrometer, into antithetical components, one of which
indicates the locations of internal material defects.

We propose a DC proximal Newton algorithm for solving nonconvex regularized
sparse learning problems in high dimensions. Our proposed algorithm integrates
the proximal Newton algorithm with multistage convex relaxation based on the
difference of convex (DC) programming, and enjoys both strong computational and
statistical guarantees. Specifically, by leveraging a sophisticated
characterization of sparse modeling structures/assumptions (i.e., local
restricted strong convexity and Hessian smoothness), we prove that within each
stage of convex relaxation, our proposed algorithm achieves (local) quadratic
convergence, and eventually obtains a sparse approximate local optimum with
optimal statistical properties after only a few convex relaxations. Numerical
experiments are provided to support our theory.

Many machine learning techniques sacrifice convenient computational
structures to gain estimation robustness and modeling flexibility. However, by
exploring the modeling structures, we find these "sacrifices" do not always
require more computational efforts. To shed light on such a "freelunch"
phenomenon, we study the squarerootLasso (SQRTLasso) type regression
problem. Specifically, we show that the nonsmooth loss functions of SQRTLasso
type regression ease tuning effort and gain adaptivity to inhomogeneous noise,
but is not necessarily more challenging than Lasso in computation. We can
directly apply proximal algorithms (e.g. proximal gradient descent, proximal
Newton, and proximal QuasiNewton algorithms) without worrying the
nonsmoothness of the loss function. Theoretically, we prove that the proximal
algorithms combined with the pathwise optimization scheme enjoy fast
convergence guarantees with high probability. Numerical results are provided to
support our theory.

We propose a general theory for studying the \xl{landscape} of nonconvex
\xl{optimization} with underlying symmetric structures \tz{for a class of
machine learning problems (e.g., lowrank matrix factorization, phase
retrieval, and deep linear neural networks)}. In specific, we characterize the
locations of stationary points and the null space of Hessian matrices \xl{of
the objective function} via the lens of invariant groups\removed{for associated
optimization problems, including lowrank matrix factorization, phase
retrieval, and deep linear neural networks}. As a major motivating example, we
apply the proposed general theory to characterize the global \xl{landscape} of
the \xl{nonconvex optimization in} lowrank matrix factorization problem. In
particular, we illustrate how the rotational symmetry group gives rise to
infinitely many nonisolated strict saddle points and equivalent global minima
of the objective function. By explicitly identifying all stationary points, we
divide the entire parameter space into three regions: ($\cR_1$) the region
containing the neighborhoods of all strict saddle points, where the objective
has negative curvatures; ($\cR_2$) the region containing neighborhoods of all
global minima, where the objective enjoys strong convexity along certain
directions; and ($\cR_3$) the complement of the above regions, where the
gradient has sufficiently large magnitudes. We further extend our result to the
matrix sensing problem. Such global landscape implies strong global convergence
guarantees for popular iterative algorithms with arbitrary initial solutions.

We propose a stochastic variance reduced optimization algorithm for solving
sparse learning problems with cardinality constraints. Sufficient conditions
are provided, under which the proposed algorithm enjoys strong linear
convergence guarantees and optimal estimation accuracy in high dimensions. We
further extend the proposed algorithm to an asynchronous parallel variant with
a near linear speedup. Numerical experiments demonstrate the efficiency of our
algorithm in terms of both parameter estimation and computational performance.

We study the least squares regression problem \begin{align*} \min_{\Theta \in
\mathcal{S}_{\odot D,R}} \A\Thetab\_2, \end{align*} where
$\mathcal{S}_{\odot D,R}$ is the set of $\Theta$ for which $\Theta =
\sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots \circ \theta_D^{(r)}$ for vectors
$\theta_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$, and
$\circ$ denotes the outer product of vectors. That is, $\Theta$ is a
lowdimensional, lowrank tensor. This is motivated by the fact that the number
of parameters in $\Theta$ is only $R \cdot \sum_{d=1}^D p_d$, which is
significantly smaller than the $\prod_{d=1}^{D} p_d$ number of parameters in
ordinary least squares regression. We consider the above CP decomposition model
of tensors $\Theta$, as well as the Tucker decomposition. For both models we
show how to apply data dimensionality reduction techniques based on {\it
sparse} random projections $\Phi \in \mathbb{R}^{m \times n}$, with $m \ll n$,
to reduce the problem to a much smaller problem $\min_{\Theta} \\Phi A \Theta
 \Phi b\_2$, for which if $\Theta'$ is a nearoptimum to the smaller problem,
then it is also a near optimum to the original problem. We obtain significantly
smaller dimension and sparsity in $\Phi$ than is possible for ordinary least
squares regression, and we also provide a number of numerical simulations
supporting our theory.

We propose a communicationally and computationally efficient algorithm for
highdimensional distributed sparse learning. At each iteration, local machines
compute the gradient on local data and the master machine solves one shifted
$l_1$ regularized minimization problem. The communication cost is reduced from
constant times of the dimension number for the stateoftheart algorithm to
constant times of the sparsity number via Twoway Truncation procedure.
Theoretically, we prove that the estimation error of the proposed algorithm
decreases exponentially and matches that of the centralized method under mild
assumptions. Extensive experiments on both simulated data and real data verify
that the proposed algorithm is efficient and has performance comparable with
the centralized method on solving highdimensional sparse learning problems.

In this paper we study the problem of noisy tensor completion for tensors
that admit a canonical polyadic or CANDECOMP/PARAFAC (CP) decomposition with
one of the factors being sparse. We present general theoretical error bounds
for an estimate obtained by using a complexityregularized maximum likelihood
principle and then instantiate these bounds for the case of additive white
Gaussian noise. We also provide an ADMMtype algorithm for solving the
complexityregularized maximum likelihood problem and validate the theoretical
finding via experiments on synthetic data set.

The measurement and analysis of Electrodermal Activity (EDA) offers
applications in diverse areas ranging from market research, to seizure
detection, to human stress analysis. Unfortunately, the analysis of EDA signals
is made difficult by the superposition of numerous components which can obscure
the signal information related to a user's response to a stimulus. We show how
simple preprocessing followed by a novel compressed sensing based
decomposition can mitigate the effects of the undesired noise components and
help reveal the underlying physiological signal. The proposed framework allows
for decomposition of EDA signals with provable bounds on the recovery of user
responses. We test our procedure on both synthetic and realworld EDA signals
from wearable sensors and demonstrate that our approach allows for more
accurate recovery of user responses as compared to the existing techniques.

This paper examines the problem of locating outlier columns in a large,
otherwise lowrank matrix, in settings where {}{the data} are noisy, or where
the overall matrix has missing elements. We propose a randomized twostep
inference framework, and establish sufficient conditions on the required sample
complexities under which these methods succeed (with high probability) in
accurately locating the outliers for each task. Comprehensive numerical
experimental results are provided to verify the theoretical bounds and
demonstrate the computational efficiency of the proposed algorithm.

The process of gathering and associating data from multiple sensors or
subdetectors due to a common physical event (the process of eventbuilding) is
used in many fields, including highenergy physics and $\gamma$ray astronomy.
Fault tolerance in eventbuilding is a challenging problem that increases in
difficulty with higher data throughput rates and increasing numbers of
subdetectors. We draw on biological selfassembly models in the development of
a novel eventbuilding paradigm that treats each packet of data from an
individual sensor or subdetector as if it were a molecule in solution. Just as
molecules are capable of forming chemical bonds, "bonds" can be defined between
data packets using metadatabased discriminants. A database  which plays the
role of a beaker of solution  continually selects pairs of assemblies at
random to test for bonds, which allows single packets and small assemblies to
aggregate into larger assemblies. During this process higherquality
associations supersede spurious ones. The database thereby becomes fluid,
dynamic, and selfcorrecting rather than static. We will describe tests of the
selfassembly paradigm using our first fluid database prototype and data from
the VERITAS $\gamma$ray telescope.

In this paper, we examine the problem of approximating a general linear
dimensionality reduction (LDR) operator, represented as a matrix $A \in
\mathbb{R}^{m \times n}$ with $m < n$, by a partial circulant matrix with rows
related by circular shifts. Partial circulant matrices admit fast
implementations via Fourier transform methods and subsampling operations; our
investigation here is motivated by a desire to leverage these potential
computational improvements in largescale data processing tasks. We establish a
fundamental result, that most large LDR matrices (whose row spaces are
uniformly distributed) in fact cannot be well approximated by partial circulant
matrices. Then, we propose a natural generalization of the partial circulant
approximation framework that entails approximating the range space of a given
LDR operator $A$ over a restricted domain of inputs, using a matrix formed as a
product of a partial circulant matrix having $m '> m$ rows and a $m \times k$
'post processing' matrix. We introduce a novel algorithmic technique, based on
sparse matrix factorization, for identifying the factors comprising such
approximations, and provide preliminary evidence to demonstrate the potential
of this approach.

This paper examines the problem of locating outlier columns in a large,
otherwise lowrank, matrix. We propose a simple twostep adaptive sensing and
inference approach and establish theoretical guarantees for its performance;
our results show that accurate outlier identification is achievable using very
few linear summaries of the original data matrix  as few as the squared rank
of the lowrank component plus the number of outliers, times constant and
logarithmic factors. We demonstrate the performance of our approach
experimentally in two stylized applications, one motivated by robust
collaborative filtering tasks, and the other by saliency map estimation tasks
arising in computer vision and automated surveillance, and also investigate
extensions to settings where the data are noisy, or possibly incomplete.

This paper examines a general class of noisy matrix completion tasks where
the goal is to estimate a matrix from observations obtained at a subset of its
entries, each of which is subject to random noise or corruption. Our specific
focus is on settings where the matrix to be estimated is wellapproximated by a
product of two (a priori unknown) matrices, one of which is sparse. Such
structural models  referred to here as "sparse factor models"  have been
widely used, for example, in subspace clustering applications, as well as in
contemporary sparse modeling and dictionary learning tasks. Our main
theoretical contributions are estimation error bounds for sparsityregularized
maximum likelihood estimators for problems of this form, which are applicable
to a number of different observation noise or corruption models. Several
specific implications are examined, including scenarios where observations are
corrupted by additive Gaussian noise or additive heaviertailed (Laplace)
noise, Poissondistributed observations, and highlyquantized (e.g., onebit)
observations. We also propose a simple algorithmic approach based on the
alternating direction method of multipliers for these tasks, and provide
experimental evidence to support our error analyses.

This work considers an estimation task in compressive sensing, where the goal
is to estimate an unknown signal from compressive measurements that are
corrupted by additive premeasurement noise (interference, or clutter) as well
as postmeasurement noise, in the specific setting where some (perhaps limited)
prior knowledge on the signal, interference, and noise is available. The
specific aim here is to devise a strategy for incorporating this prior
information into the design of an appropriate compressive measurement strategy.
Here, the prior information is interpreted as statistics of a prior
distribution on the relevant quantities, and an approach based on Bayesian
Experimental Design is proposed. Experimental results on synthetic data
demonstrate that the proposed approach outperforms traditional random
compressive measurement designs, which are agnostic to the prior information,
as well as several other knowledgeenhanced sensing matrix designs based on
more heuristic notions.

Recent breakthrough results in compressive sensing (CS) have established that
many high dimensional signals can be accurately recovered from a relatively
small number of nonadaptive linear observations, provided that the signals
possess a sparse representation in some basis. Subsequent efforts have shown
that the performance of CS can be improved by exploiting additional structure
in the locations of the nonzero signal coefficients during inference, or by
utilizing some form of datadependent adaptive measurement focusing during the
sensing process. To our knowledge, our own previous work was the first to
establish the potential benefits that can be achieved when fusing the notions
of adaptive sensing and structured sparsity  that work examined the task of
support recovery from noisy linear measurements, and established that an
adaptive sensing strategy specifically tailored to signals that are treesparse
can significantly outperform adaptive and nonadaptive sensing strategies that
are agnostic to the underlying structure. In this work we establish fundamental
performance limits for the task of support recovery of treesparse signals from
noisy measurements, in settings where measurements may be obtained either
nonadaptively (using a randomized Gaussian measurement strategy motivated by
initial CS investigations) or by any adaptive sensing strategy. Our main
results here imply that the adaptive tree sensing procedure analyzed in our
previous work is nearly optimal, in the sense that no other sensing and
estimation strategy can perform fundamentally better for identifying the
support of treesparse signals.

Estimating the level set of a signal from measurements is a task that arises
in a variety of fields, including medical imaging, astronomy, and digital
elevation mapping. Motivated by scenarios where accurate and complete
measurements of the signal may not available, we examine here a simple
procedure for estimating the level set of a signal from highly incomplete
measurements, which may additionally be corrupted by additive noise. The
proposed procedure is based on boxconstrained Total Variation (TV)
regularization. We demonstrate the performance of our approach, relative to
existing stateoftheart techniques for level set estimation from compressive
measurements, via several simulation examples.

Recent breakthrough results in compressed sensing (CS) have established that
many high dimensional objects can be accurately recovered from a relatively
small number of non adaptive linear projection observations, provided that the
objects possess a sparse representation in some basis. Subsequent efforts have
shown that the performance of CS can be improved by exploiting the structure in
the location of the nonzero signal coefficients (structured sparsity) or using
some form of online measurement focusing (adaptivity) in the sensing process.
In this paper we examine a powerful hybrid of these two techniques. First, we
describe a simple adaptive sensing procedure and show that it is a provably
effective method for acquiring sparse signals that exhibit structured sparsity
characterized by treebased coefficient dependencies. Next, employing
techniques from sparse hierarchical dictionary learning, we show that
representations exhibiting the appropriate form of structured sparsity can be
learned from collections of training data. The combination of these techniques
results in an effective and efficient adaptive compressive acquisition
procedure.

Adaptive sampling results in dramatic improvements in the recovery of sparse
signals in white Gaussian noise. A sequential adaptive samplingandrefinement
procedure called Distilled Sensing (DS) is proposed and analyzed. DS is a form
of multistage experimental design and testing. Because of the adaptive nature
of the data collection, DS can detect and localize far weaker signals than
possible from nonadaptive measurements. In particular, reliable detection and
localization (support estimation) using nonadaptive samples is possible only
if the signal amplitudes grow logarithmically with the problem dimension. Here
it is shown that using adaptive sampling, reliable detection is possible
provided the amplitude exceeds a constant, and localization is possible when
the amplitude exceeds any arbitrarily slowly growing function of the dimension.