
In this paper, we consider the mixture of sparse linear regressions model.
Let ${\beta}^{(1)},\ldots,{\beta}^{(L)}\in\mathbb{C}^n$ be $ L $ unknown sparse
parameter vectors with a total of $ K $ nonzero coefficients. Noisy linear
measurements are obtained in the form $y_i={x}_i^H {\beta}^{(\ell_i)} + w_i$,
each of which is generated randomly from one of the sparse vectors with the
label $ \ell_i $ unknown. The goal is to estimate the parameter vectors
efficiently with low sample and computational costs. This problem presents
significant challenges as one needs to simultaneously solve the demixing
problem of recovering the labels $ \ell_i $ as well as the estimation problem
of recovering the sparse vectors $ {\beta}^{(\ell)} $.
Our solution to the problem leverages the connection between modern coding
theory and statistical inference. We introduce a new algorithm, MixedColoring,
which samples the mixture strategically using query vectors $ {x}_i $
constructed based on ideas from sparse graph codes. Our novel code design
allows for both efficient demixing and parameter estimation. In the noiseless
setting, for a constant number of sparse parameter vectors, our algorithm
achieves the orderoptimal sample and time complexities of $\Theta(K)$. In the
presence of Gaussian noise, for the problem with two parameter vectors (i.e.,
$L=2$), we show that the Robust MixedColoring algorithm achieves nearoptimal
$\Theta(K polylog(n))$ sample and time complexities. When $K=O(n^{\alpha})$ for
some constant $\alpha\in(0,1)$ (i.e., $K$ is sublinear in $n$), we can achieve
sample and time complexities both sublinear in the ambient dimension. In one of
our experiments, to recover a mixture of two regressions with dimension $n=500$
and sparsity $K=50$, our algorithm is more than $300$ times faster than EM
algorithm, with about one third of its sample cost.

This paper studies the Tensor Robust Principal Component (TRPCA) problem
which extends the known Robust PCA (Candes et al. 2011) to the tensor case. Our
model is based on a new tensor Singular Value Decomposition (tSVD) (Kilmer and
Martin 2011) and its induced tensor tubal rank and tensor nuclear norm.
Consider that we have a 3way tensor ${\mathcal{X}}\in\mathbb{R}^{n_1\times
n_2\times n_3}$ such that ${\mathcal{X}}={\mathcal{L}}_0+{\mathcal{E}}_0$,
where ${\mathcal{L}}_0$ has low tubal rank and ${\mathcal{E}}_0$ is sparse. Is
that possible to recover both components? In this work, we prove that under
certain suitable assumptions, we can recover both the lowrank and the sparse
components exactly by simply solving a convex program whose objective is a
weighted combination of the tensor nuclear norm and the $\ell_1$norm, i.e.,
$\min_{{\mathcal{L}},\ {\mathcal{E}}} \
\{{\mathcal{L}}}\_*+\lambda\{{\mathcal{E}}}\_1, \ \text{s.t.} \
{\mathcal{X}}={\mathcal{L}}+{\mathcal{E}}$, where $\lambda=
{1}/{\sqrt{\max(n_1,n_2)n_3}}$. Interestingly, TRPCA involves RPCA as a special
case when $n_3=1$ and thus it is a simple and elegant tensor extension of RPCA.
Also numerical experiments verify our theory and the application for the image
denoising demonstrates the effectiveness of our method.

Lowrank modeling plays a pivotal role in signal processing and machine
learning, with applications ranging from collaborative filtering, video
surveillance, medical imaging, to dimensionality reduction and adaptive
filtering. Many modern highdimensional data and interactions thereof can be
modeled as lying approximately in a lowdimensional subspace or manifold,
possibly with additional structures, and its proper exploitations lead to
significant reduction of costs in sensing, computation and storage. In recent
years, there is a plethora of progress in understanding how to exploit lowrank
structures using computationally efficient procedures in a provable manner,
including both convex and nonconvex approaches. On one side, convex relaxations
such as nuclear norm minimization often lead to statistically optimal
procedures for estimating lowrank matrices, where firstorder methods are
developed to address the computational challenges; on the other side, there is
emerging evidence that properly designed nonconvex procedures, such as
projected gradient descent, often provide globally optimal solutions with a
much lower computational cost in many problems. This survey article will
provide a unified overview of these recent advances on lowrank matrix
estimation from incomplete measurements. Attention is paid to rigorous
characterization of the performance of these algorithms, and to problems where
the lowrank matrix have additional structural properties that require new
algorithmic designs and theoretical analysis.

In this paper, we consider the Tensor Robust Principal Component Analysis
(TRPCA) problem, which aims to exactly recover the lowrank and sparse
components from their sum. Our model is based on the recently proposed
tensortensor product (or tproduct) [13]. Induced by the tproduct, we first
rigorously deduce the tensor spectral norm, tensor nuclear norm, and tensor
average rank, and show that the tensor nuclear norm is the convex envelope of
the tensor average rank within the unit ball of the tensor spectral norm. These
definitions, their relationships and properties are consistent with matrix
cases. Equipped with the new tensor nuclear norm, we then solve the TRPCA
problem by solving a convex program and provide the theoretical guarantee for
the exact recovery. Our TRPCA model and recovery guarantee include matrix RPCA
as a special case. Numerical experiments verify our results, and the
applications to image recovery and background modeling problems demonstrate the
effectiveness of our method.

In this paper, we introduce a powerful technique, LeaveOneOut, to the
analysis of lowrank matrix completion problems. Using this technique, we
develop a general approach for obtaining finegrained, entrywise bounds on
iterative stochastic procedures. We demonstrate the power of this approach in
analyzing two of the most important algorithms for matrix completion: the
nonconvex approach based on Singular Value Projection (SVP), and the convex
relaxation approach based on nuclear norm minimization (NNM). In particular, we
prove for the first time that the original form of SVP, without resampling or
sample splitting, converges linearly in the infinity norm. We further apply our
leaveoneout approach to an iterative procedure that arises in the analysis of
the dual solutions of NNM. Our results show that NNM recovers the true $ d
$by$ d $ rank$ r $ matrix with $\mathcal{O}(\mu^2 r^3d \log d )$ observed
entries, which has optimal dependence on the dimension and is independent of
the condition number of the matrix. To the best of our knowledge, this is the
first sample complexity result for a tractable matrix completion algorithm that
satisfies these two properties simultaneously.

We consider the problem of estimating the discrete clustering structures
under SubGaussian Mixture Models. Our main results establish a hidden
integrality property of a semidefinite programming (SDP) relaxation for this
problem: while the optimal solutions to the SDP are not integervalued in
general, their estimation errors can be upper bounded in terms of the error of
an idealized integer program. The error of the integer program, and hence that
of the SDP, are further shown to decay exponentially in the signaltonoise
ratio. To the best of our knowledge, this is the first exponentially decaying
error bound for convex relaxations of mixture models, and our results reveal
the "globaltolocal" mechanism that drives the performance of the SDP
relaxation.
A corollary of our results shows that in certain regimes the SDP solutions
are in fact integral and exact, improving on existing exact recovery results
for convex relaxations. More generally, our results establish sufficient
conditions for the SDP to correctly recover the cluster memberships of
$(1\delta)$ fraction of the points for any $\delta\in(0,1)$. As a special
case, we show that under the $d$dimensional Stochastic Ball Model, SDP
achieves nontrivial (sometimes exact) recovery when the center separation is
as small as $\sqrt{1/d}$, which complements previous exact recovery results
that require constant separation.

In largescale distributed learning, security issues have become increasingly
important. Particularly in a decentralized environment, some computing units
may behave abnormally, or even exhibit Byzantine failuresarbitrary and
potentially adversarial behavior. In this paper, we develop distributed
learning algorithms that are provably robust against such failures, with a
focus on achieving optimal statistical performance. A main result of this work
is a sharp analysis of two robust distributed gradient descent algorithms based
on median and trimmed mean operations, respectively. We prove statistical error
rates for three kinds of population loss functions: strongly convex,
nonstrongly convex, and smooth nonconvex. In particular, these algorithms are
shown to achieve orderoptimal statistical error rates for strongly convex
losses. To achieve better communication efficiency, we further propose a
medianbased distributed algorithm that is provably robust, and uses only one
communication round. For strongly convex quadratic loss, we show that this
algorithm achieves the same optimal error rate as the robust distributed
gradient descent algorithms.

In this paper we consider the cluster estimation problem under the Stochastic
Block Model. We show that the semidefinite programming (SDP) formulation for
this problem achieves an error rate that decays exponentially in the
signaltonoise ratio. The error bound implies weak recovery in the sparse
graph regime with bounded expected degrees, as well as exact recovery in the
dense regime. An immediate corollary of our results yields error bounds under
the Censored Block Model. Moreover, these error bounds are robust, continuing
to hold under heterogeneous edge probabilities and a form of the socalled
monotone attack.
Significantly, this error rate is achieved by the SDP solution itself without
any further pre or postprocessing, and improves upon existing
polynomiallydecaying error bounds proved using the Grothendieck\textquoteright
s inequality. Our analysis has two key ingredients: (i) showing that the graph
has a wellbehaved spectrum, even in the sparse regime, after discounting an
exponentially small number of edges, and (ii) an orderstatistics argument that
governs the final error rate. Both arguments highlight the implicit
regularization effect of the SDP formulation.

We consider the problem of distributed statistical machine learning in
adversarial settings, where some unknown and timevarying subset of working
machines may be compromised and behave arbitrarily to prevent an accurate model
from being learned. This setting captures the potential adversarial attacks
faced by Federated Learning  a modern machine learning paradigm that is
proposed by Google researchers and has been intensively studied for ensuring
user privacy. Formally, we focus on a distributed system consisting of a
parameter server and $m$ working machines. Each working machine keeps $N/m$
data samples, where $N$ is the total number of samples. The goal is to
collectively learn the underlying true model parameter of dimension $d$.
In classical batch gradient descent methods, the gradients reported to the
server by the working machines are aggregated via simple averaging, which is
vulnerable to a single Byzantine failure. In this paper, we propose a Byzantine
gradient descent method based on the geometric median of means of the
gradients. We show that our method can tolerate $q \le (m1)/2$ Byzantine
failures, and the parameter estimate converges in $O(\log N)$ rounds with an
estimation error of $\sqrt{d(2q+1)/N}$, hence approaching the optimal error
rate $\sqrt{d/N}$ in the centralized and failurefree setting. The total
computational complexity of our algorithm is of $O((Nd/m) \log N)$ at each
working machine and $O(md + kd \log^3 N)$ at the central server, and the total
communication cost is of $O(m d \log N)$. We further provide an application of
our general results to the linear regression problem.
A key challenge arises in the above problem is that Byzantine failures create
arbitrary and unspecified dependency among the iterations and the aggregated
gradients. We prove that the aggregated gradient converges uniformly to the
true gradient function.

We study two $2$dimensional Teichm\"uller spaces of surfaces with boundary
and marked points, namely, the pentagon and the punctured triangle. We show
that their geometry is quite different from Teichm\"uller spaces of closed
surfaces. Indeed, both spaces are exhausted by regular convex geodesic polygons
with a fixed number of sides, and their geodesics diverge at most linearly.

We consider the problem of Robust PCA in the fully and partially observed
settings. Without corruptions, this is the wellknown matrix completion
problem. From a statistical standpoint this problem has been recently
wellstudied, and conditions on when recovery is possible (how many
observations do we need, how many corruptions can we tolerate) via
polynomialtime algorithms is by now understood. This paper presents and
analyzes a nonconvex optimization approach that greatly reduces the
computational complexity of the above problems, compared to the best available
algorithms. In particular, in the fully observed case, with $r$ denoting rank
and $d$ dimension, we reduce the complexity from
$\mathcal{O}(r^2d^2\log(1/\varepsilon))$ to
$\mathcal{O}(rd^2\log(1/\varepsilon))$  a big savings when the rank is big.
For the partially observed case, we show the complexity of our algorithm is no
more than $\mathcal{O}(r^4d \log d \log(1/\varepsilon))$. Not only is this the
bestknown runtime for a provable algorithm under partial observation, but in
the setting where $r$ is small compared to $d$, it also allows for
nearlinearin$d$ runtime that can be exploited in the fullyobserved case as
well, by simply running our algorithm on a subset of the observations.

This paper considers the problem of matrix completion when some number of the
columns are completely and arbitrarily corrupted, potentially by a malicious
adversary. It is wellknown that standard algorithms for matrix completion can
return arbitrarily poor results, if even a single column is corrupted. One
direct application comes from robust collaborative filtering. Here, some number
of users are socalled manipulators who try to skew the predictions of the
algorithm by calibrating their inputs to the system. In this paper, we develop
an efficient algorithm for this problem based on a combination of a trimming
procedure and a convex program that minimizes the nuclear norm and the
$\ell_{1,2}$ norm. Our theoretical results show that given a vanishing fraction
of observed entries, it is nevertheless possible to complete the underlying
matrix even when the number of corrupted columns grows. Significantly, our
results hold without any assumptions on the locations or values of the observed
entries of the manipulated columns. Moreover, we show by an
informationtheoretic argument that our guarantees are nearly optimal in terms
of the fraction of sampled entries on the authentic columns, the fraction of
corrupted columns, and the rank of the underlying matrix. Our results therefore
sharply characterize the tradeoffs between sample, robustness and rank in
matrix completion.

The stochastic block model (SBM) is a popular framework for studying
community detection in networks. This model is limited by the assumption that
all nodes in the same community are statistically equivalent and have equal
expected degrees. The degreecorrected stochastic block model (DCSBM) is a
natural extension of SBM that allows for degree heterogeneity within
communities. This paper proposes a convexified modularity maximization approach
for estimating the hidden communities under DCSBM. Our approach is based on a
convex programming relaxation of the classical (generalized) modularity
maximization formulation, followed by a novel doublyweighted $ \ell_1 $norm $
k $median procedure. We establish nonasymptotic theoretical guarantees for
both approximate clustering and perfect clustering. Our approximate clustering
results are insensitive to the minimum degree, and hold even in sparse regime
with bounded average degrees. In the special case of SBM, these theoretical
results match the bestknown performance guarantees of computationally feasible
algorithms. Numerically, we provide an efficient implementation of our
algorithm, which is applied to both synthetic and realworld networks.
Experiment results show that our method enjoys competitive performance compared
to the state of the art in the literature.

Optimization problems with rank constraints arise in many applications,
including matrix regression, structured PCA, matrix completion and matrix
decomposition problems. An attractive heuristic for solving such problems is to
factorize the lowrank matrix, and to run projected gradient descent on the
nonconvex factorized optimization problem. The goal of this problem is to
provide a general theoretical framework for understanding when such methods
work well, and to characterize the nature of the resulting fixed point. We
provide a simple set of conditions under which projected gradient descent, when
given a suitable initialization, converges geometrically to a statistically
useful solution. Our results are applicable even when the initial solution is
outside any region of local convexity, and even when the problem is globally
concave. Working in a nonasymptotic framework, we show that our conditions are
satisfied for a wide range of concrete models, including matrix regression,
structured PCA, matrix completion with real and quantized observations, matrix
decomposition, and graph clustering problems. Simulation results show excellent
agreement with the theoretical predictions.

Many models for sparse regression typically assume that the covariates are
known completely, and without noise. Particularly in highdimensional
applications, this is often not the case. This paper develops efficient
OMPlike algorithms to deal with precisely this setting. Our algorithms are as
efficient as OMP, and improve on the bestknown results for missing and noisy
data in regression, both in the highdimensional setting where we seek to
recover a sparse vector from only a few measurements, and in the classical
lowdimensional setting where we recover an unstructured regressor. In the
highdimensional setting, our supportrecovery algorithm requires no knowledge
of even the statistics of the noise. Along the way, we also obtain improved
performance guarantees for OMP for the standard sparse regression problem with
Gaussian noise.

We consider two closely related problems: planted clustering and submatrix
localization. The planted clustering problem assumes that a random graph is
generated based on some underlying clusters of the nodes; the task is to
recover these clusters given the graph. The submatrix localization problem
concerns locating hidden submatrices with elevated means inside a large
realvalued random matrix. Of particular interest is the setting where the
number of clusters/submatrices is allowed to grow unbounded with the problem
size. These formulations cover several classical models such as planted clique,
planted densest subgraph, planted partition, planted coloring, and stochastic
block model, which are widely used for studying community detection and
clustering/biclustering.
For both problems, we show that the space of the model parameters
(cluster/submatrix size, cluster density, and submatrix mean) can be
partitioned into four disjoint regions corresponding to decreasing statistical
and computational complexities: (1) the \emph{impossible} regime, where all
algorithms fail; (2) the \emph{hard} regime, where the computationally
expensive Maximum Likelihood Estimator (MLE) succeeds; (3) the \emph{easy}
regime, where the polynomialtime convexified MLE succeeds; (4) the
\emph{simple} regime, where a simple counting/thresholding procedure succeeds.
Moreover, we show that each of these algorithms provably fails in the previous
harder regimes.
Our theorems establish the minimax recovery limit, which are tight up to
constants and hold with a growing number of clusters/submatrices, and provide a
stronger performance guarantee than previously known for polynomialtime
algorithms. Our study demonstrates the tradeoffs between statistical and
computational considerations, and suggests that the minimax recovery limit may
not be achievable by polynomialtime algorithms.

We consider the mixed regression problem with two components, under
adversarial and stochastic noise. We give a convex optimization formulation
that provably recovers the true solution, and provide upper bounds on the
recovery errors for both arbitrary noise and stochastic noise settings. We also
give matching minimax lower bounds (up to log factors), showing that under
certain assumptions, our algorithm is informationtheoretically optimal. Our
results represent the first tractable algorithm guaranteeing successful
recovery with tight bounds on recovery errors and sample complexity.

This paper considers the matrix completion problem. We show that it is not
necessary to assume joint incoherence, which is a standard but unintuitive and
restrictive condition that is imposed by previous studies. This leads to a
sample complexity bound that is orderwise optimal with respect to the
incoherence parameter (as well as to the rank $r$ and the matrix dimension $n$
up to a log factor). As a consequence, we improve the sample complexity of
recovering a semidefinite matrix from $O(nr^{2}\log^{2}n)$ to $O(nr\log^{2}n)$,
and the highest allowable rank from $\Theta(\sqrt{n}/\log n)$ to
$\Theta(n/\log^{2}n)$. The key step in proof is to obtain new bounds on the
$\ell_{\infty,2}$norm, defined as the maximum of the row and column norms of a
matrix. To illustrate the applicability of our techniques, we discuss
extensions to SVD projection, structured matrix completion and semisupervised
clustering, for which we provide orderwise improvements over existing results.
Finally, we turn to the closelyrelated problem of lowrankplussparse matrix
decomposition. We show that the joint incoherence condition is unavoidable here
for polynomialtime algorithms conditioned on the Planted Clique conjecture.
This means it is intractable in general to separate a rank$\omega(\sqrt{n})$
positive semidefinite matrix and a sparse matrix. Interestingly, our results
show that the standard and joint incoherence conditions are associated
respectively with the information (statistical) and computational aspects of
the matrix decomposition problem.

Graph clustering involves the task of dividing nodes into clusters, so that
the edge density is higher within clusters as opposed to across clusters. A
natural, classic and popular statistical setting for evaluating solutions to
this problem is the stochastic block model, also referred to as the planted
partition model.
In this paper we present a new algorithma convexified version of Maximum
Likelihoodfor graph clustering. We show that, in the classic stochastic block
model setting, it outperforms existing methods by polynomial factors when the
cluster size is allowed to have general scalings. In fact, it is within
logarithmic factors of known lower bounds for spectral methods, and there is
evidence suggesting that no polynomial time algorithm would do significantly
better.
We then show that this guarantee carries over to a more general extension of
the stochastic block model. Our method can handle the settings of semirandom
graphs, heterogeneous degree distributions, unequal cluster sizes, unaffiliated
nodes, partially observed graphs and planted clique/coloring etc. In
particular, our results provide the best exact recovery guarantees to date for
the planted partition, planted kdisjointcliques and planted noisy coloring
models with general cluster sizes; in other settings, we match the best
existing results up to logarithmic factors.

This paper considers the problem of clustering a partially observed
unweighted graphi.e., one where for some node pairs we know there is an edge
between them, for some others we know there is no edge, and for the remaining
we do not know whether or not there is an edge. We want to organize the nodes
into disjoint clusters so that there is relatively dense (observed)
connectivity within clusters, and sparse across clusters.
We take a novel yet natural approach to this problem, by focusing on finding
the clustering that minimizes the number of "disagreements"i.e., the sum of
the number of (observed) missing edges within clusters, and (observed) present
edges across clusters. Our algorithm uses convex optimization; its basis is a
reduction of disagreement minimization to the problem of recovering an
(unknown) lowrank matrix and an (unknown) sparse matrix from their partially
observed sum. We evaluate the performance of our algorithm on the classical
Planted Partition/Stochastic Block Model. Our main theorem provides sufficient
conditions for the success of our algorithm as a function of the minimum
cluster size, edge density and observation probability; in particular, the
results characterize the tradeoff between the observation probability and the
edge density gap. When there are a constant number of clusters of equal size,
our results are optimal up to logarithmic factors.

Matrix completion, i.e., the exact and provable recovery of a lowrank matrix
from a small subset of its elements, is currently only known to be possible if
the matrix satisfies a restrictive structural constraintknown as {\em
incoherence}on its row and column spaces. In these cases, the subset of
elements is sampled uniformly at random.
In this paper, we show that {\em any} rank$ r $ $ n$by$ n $ matrix can be
exactly recovered from as few as $O(nr \log^2 n)$ randomly chosen elements,
provided this random choice is made according to a {\em specific biased
distribution}: the probability of any element being sampled should be
proportional to the sum of the leverage scores of the corresponding row, and
column. Perhaps equally important, we show that this specific form of sampling
is nearly necessary, in a natural precise sense; this implies that other
perhaps more intuitive sampling schemes fail.
We further establish three ways to use the above result for the setting when
leverage scores are not known \textit{a priori}: (a) a sampling strategy for
the case when only one of the row or column spaces are incoherent, (b) a
twophase sampling procedure for general matrices that first samples to
estimate leverage scores followed by sampling for exact recovery, and (c) an
analysis showing the advantages of weighted nuclear/tracenorm minimization
over the vanilla unweighted formulation for the case of nonuniform sampling.

In this paper, we study the quantization errors of modulo sigmadelta
modulated finite, asymptoticallyinfinite, infinite causal stable ARMA
processes. We prove that the normalized quantization error can be taken as a
uniformly distributed white noise for all the cases. Moreover, we find that
this nice property is guaranteed by two different mechanisms: the highenough
quantization resolution \cite{Bennett1948}\cite{WidrowKollar2008} and the
asymptotic convergence of quantization errors for some quasistationary
processes \cite{ChouGray1991}\cite{LiChenLiZhang2009}, for different cases.
But the assumption of the smooth density of the sampled random processes is
needed in all the cases.

We present a principled approach for detecting overlapping temporal community
structure in dynamic networks. Our method is based on the following framework:
find the overlapping temporal community structure that maximizes a quality
function associated with each snapshot of the network subject to a temporal
smoothness constraint. A novel quality function and a smoothness constraint are
proposed to handle overlaps, and a new convex relaxation is used to solve the
resulting combinatorial optimization problem. We provide theoretical guarantees
as well as experimental results that reveal community structure in real and
synthetic networks. Our main insight is that certain structures can be
identified only when temporal correlation is considered and when communities
are allowed to overlap. In general, discovering such overlapping temporal
community structure can enhance our understanding of realworld complex
networks by revealing the underlying stability behind their seemingly chaotic
evolution.

This paper investigates graph clustering in the planted cluster model in the
presence of {\em small clusters}. Traditional results dictate that for an
algorithm to provably correctly recover the clusters, {\em all} clusters must
be sufficiently large (in particular, $\tilde{\Omega}(\sqrt{n})$ where $n$ is
the number of nodes of the graph). We show that this is not really a
restriction: by a more refined analysis of the tracenorm based recovery
approach proposed in Jalali et al. (2011) and Chen et al. (2012), we prove that
small clusters, under certain mild assumptions, do not hinder recovery of large
ones.
Based on this result, we further devise an iterative algorithm to recover
{\em almost all clusters} via a "peeling strategy", i.e., recover large
clusters first, leading to a reduced problem, and repeat this procedure. These
results are extended to the {\em partial observation} setting, in which only a
(chosen) part of the graph is observed.The peeling strategy gives rise to an
active learning algorithm, in which edges adjacent to smaller clusters are
queried more often as large clusters are learned (and removed).
From a high level, this paper sheds novel insights on highdimensional
statistics and learning structured data, by presenting a structured matrix
learning problem for which a one shot convex relaxation approach necessarily
fails, but a carefully constructed sequence of convex relaxationsdoes the job.

We consider high dimensional sparse regression, and develop strategies able
to deal with arbitrary  possibly, severe or coordinated  errors in the
covariance matrix $X$. These may come from corrupted data, persistent
experimental errors, or malicious respondents in surveys/recommender systems,
etc. Such nonstochastic errorinvariables problems are notoriously difficult
to treat, and as we demonstrate, the problem is particularly pronounced in
highdimensional settings where the primary goal is {\em support recovery} of
the sparse regressor. We develop algorithms for support recovery in sparse
regression, when some number $n_1$ out of $n+n_1$ total covariate/response
pairs are {\it arbitrarily (possibly maliciously) corrupted}. We are interested
in understanding how many outliers, $n_1$, we can tolerate, while identifying
the correct support. To the best of our knowledge, neither standard outlier
rejection techniques, nor recently developed robust regression algorithms (that
focus only on corrupted response variables), nor recent algorithms for dealing
with stochastic noise or erasures, can provide guarantees on support recovery.
Perhaps surprisingly, we also show that the natural brute force algorithm that
searches over all subsets of $n$ covariate/response pairs, and all subsets of
possible support coordinates in order to minimize regression error, is
remarkably poor, unable to correctly identify the support with even $n_1 =
O(n/k)$ corrupted points, where $k$ is the sparsity. This is true even in the
basic setting we consider, where all authentic measurements and noise are
independent and subGaussian. In this setting, we provide a simple algorithm 
no more computationally taxing than OMP  that gives stronger performance
guarantees, recovering the support with up to $n_1 = O(n/(\sqrt{k} \log p))$
corrupted points, where $p$ is the dimension of the signal to be recovered.