
Binary embedding is a nonlinear dimension reduction methodology where high
dimensional data are embedded into the Hamming cube while preserving the
structure of the original space. Specifically, for an arbitrary $N$ distinct
points in $\mathbb{S}^{p1}$, our goal is to encode each point using
$m$dimensional binary strings such that we can reconstruct their geodesic
distance up to $\delta$ uniform distortion. Existing binary embedding
algorithms either lack theoretical guarantees or suffer from running time
$O\big(mp\big)$. We make three contributions: (1) we establish a lower bound
that shows any binary embedding oblivious to the set of points requires $m =
\Omega(\frac{1}{\delta^2}\log{N})$ bits and a similar lower bound for
nonoblivious embeddings into Hamming distance; (2) [DELETED, see comment]; (3)
we also provide an analytic result about embedding a general set of points $K
\subseteq \mathbb{S}^{p1}$ with even infinite size. Our theoretical findings
are supported through experiments on both synthetic and real data sets.

We consider the problem of Robust PCA in the fully and partially observed
settings. Without corruptions, this is the wellknown matrix completion
problem. From a statistical standpoint this problem has been recently
wellstudied, and conditions on when recovery is possible (how many
observations do we need, how many corruptions can we tolerate) via
polynomialtime algorithms is by now understood. This paper presents and
analyzes a nonconvex optimization approach that greatly reduces the
computational complexity of the above problems, compared to the best available
algorithms. In particular, in the fully observed case, with $r$ denoting rank
and $d$ dimension, we reduce the complexity from
$\mathcal{O}(r^2d^2\log(1/\varepsilon))$ to
$\mathcal{O}(rd^2\log(1/\varepsilon))$  a big savings when the rank is big.
For the partially observed case, we show the complexity of our algorithm is no
more than $\mathcal{O}(r^4d \log d \log(1/\varepsilon))$. Not only is this the
bestknown runtime for a provable algorithm under partial observation, but in
the setting where $r$ is small compared to $d$, it also allows for
nearlinearin$d$ runtime that can be exploited in the fullyobserved case as
well, by simply running our algorithm on a subset of the observations.

We consider the problem of solving mixed random linear equations with $k$
components. This is the noiseless setting of mixed linear regression. The goal
is to estimate multiple linear models from mixed samples in the case where the
labels (which sample corresponds to which model) are not observed. We give a
tractable algorithm for the mixed linear equation problem, and show that under
some technical conditions, our algorithm is guaranteed to solve the problem
exactly with sample complexity linear in the dimension, and polynomial in $k$,
the number of components. Previous approaches have required either exponential
dependence on $k$, or superlinear dependence on the dimension. The proposed
algorithm is a combination of tensor decomposition and alternating
minimization. Our analysis involves proving that the initialization provided by
the tensor method allows alternating minimization, which is equivalent to EM in
our setting, to converge to the global optimum at a linear rate.

Latent variable models are a fundamental modeling tool in machine learning
applications, but they present significant computational and analytical
challenges. The popular EM algorithm and its variants, is a much used
algorithmic tool; yet our rigorous understanding of its performance is highly
incomplete. Recently, work in Balakrishnan et al. (2014) has demonstrated that
for an important class of problems, EM exhibits linear local convergence. In
the highdimensional setting, however, the Mstep may not be well defined. We
address precisely this setting through a unified treatment using
regularization. While regularization for highdimensional problems is by now
well understood, the iterative EM algorithm requires a careful balancing of
making progress towards the solution while identifying the right structure
(e.g., sparsity or lowrank). In particular, regularizing the Mstep using the
stateoftheart highdimensional prescriptions (e.g., Wainwright (2014)) is
not guaranteed to provide this balance. Our algorithm and analysis are linked
in a way that reveals the balance between optimization and statistical errors.
We specialize our general framework to sparse gaussian mixture models,
highdimensional mixed regression, and regression with missing variables,
obtaining statistical guarantees for each of these examples.

Linear regression studies the problem of estimating a model parameter
$\beta^* \in \mathbb{R}^p$, from $n$ observations
$\{(y_i,\mathbf{x}_i)\}_{i=1}^n$ from linear model $y_i = \langle
\mathbf{x}_i,\beta^* \rangle + \epsilon_i$. We consider a significant
generalization in which the relationship between $\langle \mathbf{x}_i,\beta^*
\rangle$ and $y_i$ is noisy, quantized to a single bit, potentially nonlinear,
noninvertible, as well as unknown. This model is known as the singleindex
model in statistics, and, among other things, it represents a significant
generalization of onebit compressed sensing. We propose a novel spectralbased
estimation procedure and show that we can recover $\beta^*$ in settings (i.e.,
classes of link function $f$) where previous algorithms fail. In general, our
algorithm requires only very mild restrictions on the (unknown) functional
relationship between $y_i$ and $\langle \mathbf{x}_i,\beta^* \rangle$. We also
consider the high dimensional setting where $\beta^*$ is sparse ,and introduce
a twostage nonconvex framework that addresses estimation challenges in high
dimensional regimes where $p \gg n$. For a broad class of link functions
between $\langle \mathbf{x}_i,\beta^* \rangle$ and $y_i$, we establish minimax
lower bounds that demonstrate the optimality of our estimators in both the
classical and high dimensional regimes.

We consider the mixed regression problem with two components, under
adversarial and stochastic noise. We give a convex optimization formulation
that provably recovers the true solution, and provide upper bounds on the
recovery errors for both arbitrary noise and stochastic noise settings. We also
give matching minimax lower bounds (up to log factors), showing that under
certain assumptions, our algorithm is informationtheoretically optimal. Our
results represent the first tractable algorithm guaranteeing successful
recovery with tight bounds on recovery errors and sample complexity.

Mixed linear regression involves the recovery of two (or more) unknown
vectors from unlabeled linear measurements; that is, where each sample comes
from exactly one of the vectors, but we do not know which one. It is a classic
problem, and the natural and empirically most popular approach to its solution
has been the EM algorithm. As in other settings, this is prone to bad local
minima; however, each iteration is very fast (alternating between guessing
labels, and solving with those labels).
In this paper we provide a new initialization procedure for EM, based on
finding the leading two eigenvectors of an appropriate matrix. We then show
that with this, a resampled version of the EM algorithm provably converges to
the correct vectors, under natural assumptions on the sampling distribution,
and with nearly optimal (unimprovable) sample complexity. This provides not
only the first characterization of EM's performance, but also much lower sample
complexity as compared to both standard (randomly initialized) EM, and other
methods for this problem.