
In the low rank matrix completion (LRMC) problem, the low rank assumption
means that the columns (or rows) of the matrix to be completed are points on a
lowdimensional linear algebraic variety. This paper extends this thinking to
cases where the columns are points on a lowdimensional nonlinear algebraic
variety, a problem we call Low Algebraic Dimension Matrix Completion (LADMC).
Matrices whose columns belong to a union of subspaces (UoS) are an important
special case. We propose a LADMC algorithm that leverages existing LRMC methods
on a tensorized representation of the data. For example, a secondorder
tensorization representation is formed by taking the outer product of each
column with itself, and we consider higher order tensorizations as well. This
approach will succeed in many cases where traditional LRMC is guaranteed to
fail because the data are lowrank in the tensorized representation but not in
the original representation. We also provide a formal mathematical
justification for the success of our method. In particular, we show bounds of
the rank of these data in the tensorized representation, and we prove sampling
requirements to guarantee uniqueness of the solution. Interestingly, the
sampling requirements of our LADMC algorithm nearly match the information
theoretic lower bounds for matrix completion under a UoS model. We also provide
experimental results showing that the new approach significantly outperforms
existing stateoftheart methods for matrix completion in many situations.

We consider a generalization of lowrank matrix completion to the case where
the data belongs to an algebraic variety, i.e. each data point is a solution to
a system of polynomial equations. In this case the original matrix is possibly
highrank, but it becomes lowrank after mapping each column to a higher
dimensional space of monomial features. Many wellstudied extensions of linear
models, including affine subspaces and their union, can be described by a
variety model. In addition, varieties can be used to model a richer class of
nonlinear quadratic and higher degree curves and surfaces. We study the
sampling requirements for matrix completion under a variety model with a focus
on a union of affine subspaces. We also propose an efficient matrix completion
algorithm that minimizes a convex or nonconvex surrogate of the rank of the
matrix of monomial features. Our algorithm uses the wellknown "kernel trick"
to avoid working directly with the highdimensional monomial matrix. We show
the proposed algorithm is able to recover synthetically generated data up to
the predicted sampling complexity bounds. The proposed algorithm also
outperforms standard low rank matrix completion and subspace clustering
techniques in experiments with real data.

Lowrank matrix completion (LRMC) problems arise in a wide variety of
applications. Previous theory mainly provides conditions for completion under
missingatrandom samplings. This paper studies deterministic conditions for
completion. An incomplete $d \times N$ matrix is finitely rank$r$ completable
if there are at most finitely many rank$r$ matrices that agree with all its
observed entries. Finite completability is the tipping point in LRMC, as a few
additional samples of a finitely completable matrix guarantee its unique
completability. The main contribution of this paper is a deterministic sampling
condition for finite completability. We use this to also derive deterministic
sampling conditions for unique completability that can be efficiently verified.
We also show that under uniform random sampling schemes, these conditions are
satisfied with high probability if $O(\max\{r,\log d\})$ entries per column are
observed. These findings have several implications on LRMC regarding lower
bounds, sample and computational complexity, the role of coherence, adaptive
settings and the validation of any completion algorithm. We complement our
theoretical results with experiments that support our findings and motivate
future analysis of uncharted sampling regimes.

Consider a generic $r$dimensional subspace of $\mathbb{R}^d$, $r<d$, and
suppose that we are only given projections of this subspace onto small subsets
of the canonical coordinates. The paper establishes necessary and sufficient
deterministic conditions on the subsets for subspace identifiability.

This paper studies ordered weighted L1 (OWL) norm regularization for sparse
estimation problems with strongly correlated variables. We prove sufficient
conditions for clustering based on the correlation/colinearity of variables
using the OWL norm, of which the socalled OSCAR is a particular case. Our
results extend previous ones for OSCAR in several ways: for the squared error
loss, our conditions hold for the more general OWL norm and under weaker
assumptions; we also establish clustering conditions for the absolute error
loss, which is, as far as we know, a novel result. Furthermore, we characterize
the statistical performance of OWL norm regularization for generative models in
which certain clusters of regression variables are strongly (even perfectly)
correlated, but variables in different clusters are uncorrelated. We show that
if the true pdimensional signal generating the data involves only s of the
clusters, then O(s log p) samples suffice to accurately estimate the signal,
regardless of the number of coefficients within the clusters. The estimation of
ssparse signals with completely independent variables requires just as many
measurements. In other words, using the OWL we pay no price (in terms of the
number of measurements) for the presence of strongly correlated variables.

This paper proposes a simple adaptive sensing and group testing algorithm for
sparse signal recovery. The algorithm, termed Compressive Adaptive Sense and
Search (CASS), is shown to be nearoptimal in that it succeeds at the lowest
possible signaltonoiseratio (SNR) levels, improving on previous work in
adaptive compressed sensing. Like traditional compressed sensing based on
random nonadaptive design matrices, the CASS algorithm requires only k log n
measurements to recover a ksparse signal of dimension n. However, CASS
succeeds at SNR levels that are a factor log n less than required by standard
compressed sensing. From the point of view of constructing and implementing the
sensing operation as well as computing the reconstruction, the proposed
algorithm is substantially less computationally intensive than standard
compressed sensing. CASS is also demonstrated to perform considerably better in
practice through simulation. To the best of our knowledge, this is the first
demonstration of an adaptive compressed sensing algorithm with nearoptimal
theoretical guarantees and excellent practical performance. This paper also
shows that methods like compressed sensing, group testing, and pooling have an
advantage beyond simply reducing the number of measurements or tests 
adaptive versions of such methods can also improve detection and estimation
performance when compared to nonadaptive direct (uncompressed) sensing.

This paper studies graphical model selection, i.e., the problem of estimating
a graph of statistical relationships among a collection of random variables.
Conventional graphical model selection algorithms are passive, i.e., they
require all the measurements to have been collected before processing begins.
We propose an active learning algorithm that uses junction tree representations
to adapt future measurements based on the information gathered from prior
measurements. We prove that, under certain conditions, our active learning
algorithm requires fewer scalar measurements than any passive algorithm to
reliably estimate a graph. A range of numerical results validate our theory and
demonstrates the benefits of active learning.

This paper investigates the problem of determining a binaryvalued function
through a sequence of strategically selected queries. The focus is an algorithm
called Generalized Binary Search (GBS). GBS is a wellknown greedy algorithm
for determining a binaryvalued function through a sequence of strategically
selected queries. At each step, a query is selected that most evenly splits the
hypotheses under consideration into two disjoint subsets, a natural
generalization of the idea underlying classic binary search. This paper
develops novel incoherence and geometric conditions under which GBS achieves
the informationtheoretically optimal query complexity; i.e., given a
collection of N hypotheses, GBS terminates with the correct function after no
more than a constant times log N queries. Furthermore, a noisetolerant version
of GBS is developed that also achieves the optimal query complexity. These
results are applied to learning halfspaces, a problem arising routinely in
image processing and machine learning.

This paper studies the sample complexity of searching over multiple
populations. We consider a large number of populations, each corresponding to
either distribution P0 or P1. The goal of the search problem studied here is to
find one population corresponding to distribution P1 with as few samples as
possible. The main contribution is to quantify the number of samples needed to
correctly find one such population. We consider two general approaches:
nonadaptive sampling methods, which sample each population a predetermined
number of times until a population following P1 is found, and adaptive sampling
methods, which employ sequential sampling schemes for each population. We first
derive a lower bound on the number of samples required by any sampling scheme.
We then consider an adaptive procedure consisting of a series of sequential
probability ratio tests, and show it comes within a constant factor of the
lower bound. We give explicit expressions for this constant when samples of the
populations follow Gaussian and Bernoulli distributions. An alternative
adaptive scheme is discussed which does not require full knowledge of P1, and
comes within a constant factor of the optimal scheme. For comparison, a lower
bound on the sampling requirements of any nonadaptive scheme is presented.

This paper provides lower bounds on the convergence rate of Derivative Free
Optimization (DFO) with noisy function evaluations, exposing a fundamental and
unavoidable gap between the performance of algorithms with access to gradients
and those with access to only function evaluations. However, there are
situations in which DFO is unavoidable, and for such situations we propose a
new DFO algorithm that is proved to be near optimal for the class of strongly
convex objective functions. A distinctive feature of the algorithm is that it
uses only Booleanvalued function comparisons, rather than function
evaluations. This makes the algorithm useful in an even wider range of
applications, such as optimization based on paired comparisons from human
subjects, for example. We also show that regardless of whether DFO is based on
noisy function evaluations or Booleanvalued function comparisons, the
convergence rate is the same.

We propose a simple modification to the recently proposed compressive binary
search. The modification removes an unnecessary and suboptimal factor of log
log n from the SNR requirement, making the procedure optimal (up to a small
constant). Simulations show that the new procedure performs significantly
better in practice as well. We also contrast this problem with the more well
known problem of noisy binary search.

This paper examines the problem of ranking a collection of objects using
pairwise comparisons (rankings of two objects). In general, the ranking of $n$
objects can be identified by standard sorting methods using $n log_2 n$
pairwise comparisons. We are interested in natural situations in which
relationships among the objects may allow for ranking using far fewer pairwise
comparisons. Specifically, we assume that the objects can be embedded into a
$d$dimensional Euclidean space and that the rankings reflect their relative
distances from a common reference point in $R^d$. We show that under this
assumption the number of possible rankings grows like $n^{2d}$ and demonstrate
an algorithm that can identify a randomly selected ranking using just slightly
more than $d log n$ adaptively selected pairwise comparisons, on average. If
instead the comparisons are chosen at random, then almost all pairwise
comparisons must be made in order to identify any ranking. In addition, we
propose a robust, errortolerant algorithm that only requires that the pairwise
comparisons are probably correct. Experimental studies with synthetic and real
datasets support the conclusions of our theoretical analysis.

Statistical dependencies among wavelet coefficients are commonly represented
by graphical models such as hidden Markov trees(HMTs). However, in linear
inverse problems such as deconvolution, tomography, and compressed sensing, the
presence of a sensing or observation matrix produces a linear mixing of the
simple Markovian dependency structure. This leads to reconstruction problems
that are nonconvex optimizations. Past work has dealt with this issue by
resorting to greedy or suboptimal iterative reconstruction methods. In this
paper, we propose new modeling approaches based on groupsparsity penalties
that leads to convex optimizations that can be solved exactly and efficiently.
We show that the methods we develop perform significantly better in
deconvolution and compressed sensing applications, while being as
computationally efficient as standard coefficientwise approaches such as
lasso.

The ability to detect weak distributed activation patterns in networks is
critical to several applications, such as identifying the onset of anomalous
activity or incipient congestion in the Internet, or faint traces of a
biochemical spread by a sensor network. This is a challenging problem since
weak distributed patterns can be invisible in per node statistics as well as a
global networkwide aggregate. Most prior work considers situations in which
the activation/nonactivation of each node is statistically independent, but
this is unrealistic in many problems. In this paper, we consider structured
patterns arising from statistical dependencies in the activation process. Our
contributions are threefold. First, we propose a sparsifying transform that
succinctly represents structured activation patterns that conform to a
hierarchical dependency graph. Second, we establish that the proposed transform
facilitates detection of very weak activation patterns that cannot be detected
with existing methods. Third, we show that the structure of the hierarchical
dependency graph governing the activation process, and hence the network
transform, can be learnt from very few (logarithmic in network size)
independent snapshots of network activity.

We describe here a framework for a certain class of multiscale likelihood
factorizations wherein, in analogy to a wavelet decomposition of an L^2
function, a given likelihood function has an alternative representation as a
product of conditional densities reflecting information in both the data and
the parameter vector localized in position and scale. The framework is
developed as a set of sufficient conditions for the existence of such
factorizations, formulated in analogy to those underlying a standard
multiresolution analysis for wavelets, and hence can be viewed as a
multiresolution analysis for likelihoods. We then consider the use of these
factorizations in the task of nonparametric, complexity penalized likelihood
estimation. We study the risk properties of certain thresholding and
partitioning estimators, and demonstrate their adaptivity and nearoptimality,
in a minimax sense over a broad range of function spaces, based on squared
Hellinger distance as a loss function. In particular, our results provide an
illustration of how properties of classical waveletbased estimators can be
obtained in a single, unified framework that includes models for continuous,
count and categorical data types.