This paper presents a class of new algorithms for distributed statistical
estimation that exploit divide-and-conquer approach. We show that one of the
key benefits of the divide-and-conquer strategy is robustness, an important
characteristic for large distributed systems. We establish connections between
performance of these distributed algorithms and the rates of convergence in
normal approximation, and prove non-asymptotic deviations guarantees, as well
as limit theorems, for the resulting estimators. Our techniques are illustrated
through several examples: in particular, we obtain new results for the
median-of-means estimator, as well as provide performance guarantees for
distributed maximum likelihood estimation.
Estimation of the covariance matrix has attracted a lot of attention of the
statistical research community over the years, partially due to important
applications such as Principal Component Analysis. However, frequently used
empirical covariance estimator (and its modifications) is very sensitive to
outliers in the data. As P. J. Huber wrote in 1964, "...This raises a question
which could have been asked already by Gauss, but which was, as far as I know,
only raised a few years ago (notably by Tukey): what happens if the true
distribution deviates slightly from the assumed normal one? As is now well
known, the sample mean then may have a catastrophically bad performance..."
Motivated by this question, we develop a new estimator of the (element-wise)
mean of a random matrix, which includes covariance estimation problem as a
special case. Assuming that the entries of a matrix possess only finite second
moment, this new estimator admits sub-Gaussian or sub-exponential concentration
around the unknown mean in the operator norm. We will explain the key ideas
behind our construction, as well as applications to covariance estimation and
matrix completion problems.
We present Rosenthal-type moment inequalities for matrix-valued U-statistics
of order 2. As a corollary, we obtain new matrix concentration inequalities for
U-statistics. One of our main technical tools, a version of the non-commutative
Khintchine inequality for the spectral norm of the Rademacher chaos, could be
of independent interest.
Let $Y$ be a $d$-dimensional random vector with unknown mean $\mu$ and
covariance matrix $\Sigma$. This paper is motivated by the problem of designing
an estimator of $\Sigma$ that admits tight deviation bounds in the operator
norm under minimal assumptions on the underlying distribution, such as
existence of only 4th moments of the coordinates of $Y$. To address this
problem, we propose robust modifications of the operator-valued U-statistics,
obtain non-asymptotic guarantees for their performance, and demonstrate the
implications of these results to the covariance estimation problem under
various structural assumptions.
We propose and analyze a new estimator of the covariance matrix that admits
strong theoretical guarantees under weak assumptions on the underlying
distribution, such as existence of moments of only low order. While estimation
of covariance matrices corresponding to sub-Gaussian distributions is
well-understood, much less in known in the case of heavy-tailed data. As K.
Balasubramanian and M. Yuan write, "data from real-world experiments oftentimes
tend to be corrupted with outliers and/or exhibit heavy tails. In such cases,
it is not clear that those covariance matrix estimators .. remain optimal" and
"..what are the other possible strategies to deal with heavy tailed
distributions warrant further studies." We make a step towards answering this
question and prove tight deviation inequalities for the proposed estimator that
depend only on the parameters controlling the "intrinsic dimension" associated
to the covariance matrix (as opposed to the dimension of the ambient space); in
particular, our results are applicable in the case of high-dimensional
We present some extensions of Bernstein's concentration inequality for random
matrices. This inequality has become a useful and powerful tool for many
problems in statistics, signal processing and theoretical computer science. The
main feature of our bounds is that, unlike the majority of previous related
results, they do not depend on the dimension $d$ of the ambient space. Instead,
the dimension factor is replaced by the "effective rank" associated with the
underlying distribution that is bounded from above by $d$. In particular, this
makes an extension to the infinite-dimensional setting possible. Our
inequalities refine earlier results in this direction obtained by D. Hsu, S. M.
Kakade and T. Zhang.
We study high-dimensional signal recovery from non-linear measurements with
design vectors having elliptically symmetric distribution. Special attention is
devoted to the situation when the unknown signal belongs to a set of low
statistical complexity, while both the measurements and the design vectors are
heavy-tailed. We propose and analyze a new estimator that adapts to the
structure of the problem, while being robust both to the possible model
misspecification characterized by arbitrary non-linearity of the measurements
as well as to data corruption modeled by the heavy-tailed distributions.
Moreover, this estimator has low computational complexity. Our results are
expressed in the form of exponential concentration inequalities for the error
of the proposed estimator. On the technical side, our proofs rely on the
generic chaining methods, and illustrate the power of this approach for
statistical applications. Theory is supported by numerical experiments
demonstrating that our estimator outperforms existing alternatives when data is
We propose a novel approach to Bayesian analysis that is provably robust to
outliers in the data and often has computational advantages over standard
methods. Our technique is based on splitting the data into non-overlapping
subgroups, evaluating the posterior distribution given each independent
subgroup, and then combining the resulting measures. The main novelty of our
approach is the proposed aggregation step, which is based on the evaluation of
a median in the space of probability measures equipped with a suitable
collection of distances that can be quickly and efficiently evaluated in
practice. We present both theoretical and numerical evidence illustrating the
improvements achieved by our method.
High-dimensional datasets are well-approximated by low-dimensional
structures. Over the past decade, this empirical observation motivated the
investigation of detection, measurement, and modeling techniques to exploit
these low-dimensional intrinsic structures, yielding numerous implications for
high-dimensional statistics, machine learning, and signal processing. Manifold
learning (where the low-dimensional structure is a manifold) and dictionary
learning (where the low-dimensional structure is the set of sparse linear
combinations of vectors from a finite dictionary) are two prominent theoretical
and computational frameworks in this area. Despite their ostensible
distinction, the recently-introduced Geometric Multi-Resolution Analysis (GMRA)
provides a robust, computationally efficient, multiscale procedure for
simultaneously learning manifolds and dictionaries.
In this work, we prove non-asymptotic probabilistic bounds on the
approximation error of GMRA for a rich class of data-generating statistical
models that includes "noisy" manifolds, thereby establishing the theoretical
robustness of the procedure and confirming empirical observations. In
particular, if a dataset aggregates near a low-dimensional manifold, our
results show that the approximation error of the GMRA is completely independent
of the ambient dimension. Our work therefore establishes GMRA as a provably
fast algorithm for dictionary learning with approximation and sparsity
guarantees. We include several numerical experiments confirming these
theoretical results, and our theoretical framework provides new tools for
assessing the behavior of manifold learning and dictionary learning procedures
on a large class of interesting models.
In many real-world applications, collected data are contaminated by noise
with heavy-tailed distribution and might contain outliers of large magnitude.
In this situation, it is necessary to apply methods which produce reliable
outcomes even if the input contains corrupted measurements. We describe a
general method which allows one to obtain estimators with tight concentration
around the true parameter of interest taking values in a Banach space.
Suggested construction relies on the fact that the geometric median of a
collection of independent "weakly concentrated" estimators satisfies a much
stronger deviation bound than each individual element in the collection. Our
approach is illustrated through several examples, including sparse linear
regression and low-rank matrix recovery problems.
Individualized treatment rules (ITRs) tailor treatments according to
individual patient characteristics. They can significantly improve patient care
and are thus becoming increasingly popular. The data collected during
randomized clinical trials are often used to estimate the optimal ITRs.
However, these trials are generally expensive to run, and, moreover, they are
not designed to efficiently estimate ITRs. In this paper, we propose a
cost-effective estimation method from an active learning perspective. In
particular, our method recruits only the "most informative" patients (in terms
of learning the optimal ITRs) from an ongoing clinical trial. Simulation
studies and real-data examples show that our active clinical trial method
significantly improves on competing methods. We derive risk bounds and show
that they support these observed empirical advantages.
We study functional regression with random subgaussian design and real-valued
response. The focus is on the problems in which the regression function can be
well approximated by a functional linear model with the slope function being
"sparse" in the sense that it can be represented as a sum of a small number of
well separated "spikes". This can be viewed as an extension of now classical
sparse estimation problems to the case of infinite dictionaries. We study an
estimator of the regression function based on penalized empirical risk
minimization with quadratic loss and the complexity penalty defined in terms of
$L_1$-norm (a continuous version of LASSO). The main goal is to introduce
several important parameters characterizing sparsity in this class of problems
and to prove sharp oracle inequalities showing how the $L_2$-error of the
continuous LASSO estimator depends on the underlying sparsity of the problem.
We present a new active learning algorithm based on nonparametric estimators
of the regression function. Our investigation provides probabilistic bounds for
the rates of convergence of the generalization error achievable by proposed
method over a broad class of underlying distributions. We also prove minimax
lower bounds which show that the obtained rates are almost tight.