• ### Distributed Statistical Estimation and Rates of Convergence in Normal Approximation(1704.02658)

Aug. 27, 2018 cs.DC, math.ST, stat.TH, stat.ML
This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important characteristic for large distributed systems. We establish connections between performance of these distributed algorithms and the rates of convergence in normal approximation, and prove non-asymptotic deviations guarantees, as well as limit theorems, for the resulting estimators. Our techniques are illustrated through several examples: in particular, we obtain new results for the median-of-means estimator, as well as provide performance guarantees for distributed maximum likelihood estimation.
• ### Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries(1605.07129)

June 17, 2018 math.ST, stat.TH, stat.ML
Estimation of the covariance matrix has attracted a lot of attention of the statistical research community over the years, partially due to important applications such as Principal Component Analysis. However, frequently used empirical covariance estimator (and its modifications) is very sensitive to outliers in the data. As P. J. Huber wrote in 1964, "...This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance..." Motivated by this question, we develop a new estimator of the (element-wise) mean of a random matrix, which includes covariance estimation problem as a special case. Assuming that the entries of a matrix possess only finite second moment, this new estimator admits sub-Gaussian or sub-exponential concentration around the unknown mean in the operator norm. We will explain the key ideas behind our construction, as well as applications to covariance estimation and matrix completion problems.
• ### Moment inequalities for matrix-valued U-statistics of order 2(1801.05921)

May 8, 2018 math.PR
We present Rosenthal-type moment inequalities for matrix-valued U-statistics of order 2. As a corollary, we obtain new matrix concentration inequalities for U-statistics. One of our main technical tools, a version of the non-commutative Khintchine inequality for the spectral norm of the Rademacher chaos, could be of independent interest.
• ### Robust Modifications of U-statistics and Applications to Covariance Estimation Problems(1801.05565)

March 8, 2018 math.ST, stat.TH
Let $Y$ be a $d$-dimensional random vector with unknown mean $\mu$ and covariance matrix $\Sigma$. This paper is motivated by the problem of designing an estimator of $\Sigma$ that admits tight deviation bounds in the operator norm under minimal assumptions on the underlying distribution, such as existence of only 4th moments of the coordinates of $Y$. To address this problem, we propose robust modifications of the operator-valued U-statistics, obtain non-asymptotic guarantees for their performance, and demonstrate the implications of these results to the covariance estimation problem under various structural assumptions.
• ### Estimation of the covariance structure of heavy-tailed distributions(1708.00502)

Jan. 16, 2018 math.ST, stat.TH
We propose and analyze a new estimator of the covariance matrix that admits strong theoretical guarantees under weak assumptions on the underlying distribution, such as existence of moments of only low order. While estimation of covariance matrices corresponding to sub-Gaussian distributions is well-understood, much less in known in the case of heavy-tailed data. As K. Balasubramanian and M. Yuan write, "data from real-world experiments oftentimes tend to be corrupted with outliers and/or exhibit heavy tails. In such cases, it is not clear that those covariance matrix estimators .. remain optimal" and "..what are the other possible strategies to deal with heavy tailed distributions warrant further studies." We make a step towards answering this question and prove tight deviation inequalities for the proposed estimator that depend only on the parameters controlling the "intrinsic dimension" associated to the covariance matrix (as opposed to the dimension of the ambient space); in particular, our results are applicable in the case of high-dimensional observations.
• ### On Some Extensions of Bernstein's Inequality for Self-adjoint Operators(1112.5448)

April 14, 2017 math.PR, math.ST, stat.TH
We present some extensions of Bernstein's concentration inequality for random matrices. This inequality has become a useful and powerful tool for many problems in statistics, signal processing and theoretical computer science. The main feature of our bounds is that, unlike the majority of previous related results, they do not depend on the dimension $d$ of the ambient space. Instead, the dimension factor is replaced by the "effective rank" associated with the underlying distribution that is bounded from above by $d$. In particular, this makes an extension to the infinite-dimensional setting possible. Our inequalities refine earlier results in this direction obtained by D. Hsu, S. M. Kakade and T. Zhang.
• ### Structured signal recovery from non-linear and heavy-tailed measurements(1609.01025)

Nov. 11, 2016 math.ST, stat.TH
We study high-dimensional signal recovery from non-linear measurements with design vectors having elliptically symmetric distribution. Special attention is devoted to the situation when the unknown signal belongs to a set of low statistical complexity, while both the measurements and the design vectors are heavy-tailed. We propose and analyze a new estimator that adapts to the structure of the problem, while being robust both to the possible model misspecification characterized by arbitrary non-linearity of the measurements as well as to data corruption modeled by the heavy-tailed distributions. Moreover, this estimator has low computational complexity. Our results are expressed in the form of exponential concentration inequalities for the error of the proposed estimator. On the technical side, our proofs rely on the generic chaining methods, and illustrate the power of this approach for statistical applications. Theory is supported by numerical experiments demonstrating that our estimator outperforms existing alternatives when data is heavy-tailed.
• ### Robust and Scalable Bayes via a Median of Subset Posterior Measures(1403.2660)

June 2, 2016 cs.DC, math.ST, stat.TH, cs.LG
We propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods. Our technique is based on splitting the data into non-overlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then combining the resulting measures. The main novelty of our approach is the proposed aggregation step, which is based on the evaluation of a median in the space of probability measures equipped with a suitable collection of distances that can be quickly and efficiently evaluated in practice. We present both theoretical and numerical evidence illustrating the improvements achieved by our method.
• ### Multiscale Dictionary Learning: Non-Asymptotic Bounds and Robustness(1401.5833)

Dec. 13, 2015 math.ST, stat.TH
High-dimensional datasets are well-approximated by low-dimensional structures. Over the past decade, this empirical observation motivated the investigation of detection, measurement, and modeling techniques to exploit these low-dimensional intrinsic structures, yielding numerous implications for high-dimensional statistics, machine learning, and signal processing. Manifold learning (where the low-dimensional structure is a manifold) and dictionary learning (where the low-dimensional structure is the set of sparse linear combinations of vectors from a finite dictionary) are two prominent theoretical and computational frameworks in this area. Despite their ostensible distinction, the recently-introduced Geometric Multi-Resolution Analysis (GMRA) provides a robust, computationally efficient, multiscale procedure for simultaneously learning manifolds and dictionaries. In this work, we prove non-asymptotic probabilistic bounds on the approximation error of GMRA for a rich class of data-generating statistical models that includes "noisy" manifolds, thereby establishing the theoretical robustness of the procedure and confirming empirical observations. In particular, if a dataset aggregates near a low-dimensional manifold, our results show that the approximation error of the GMRA is completely independent of the ambient dimension. Our work therefore establishes GMRA as a provably fast algorithm for dictionary learning with approximation and sparsity guarantees. We include several numerical experiments confirming these theoretical results, and our theoretical framework provides new tools for assessing the behavior of manifold learning and dictionary learning procedures on a large class of interesting models.
• ### Geometric median and robust estimation in Banach spaces(1308.1334)

Sept. 29, 2015 math.ST, stat.TH
In many real-world applications, collected data are contaminated by noise with heavy-tailed distribution and might contain outliers of large magnitude. In this situation, it is necessary to apply methods which produce reliable outcomes even if the input contains corrupted measurements. We describe a general method which allows one to obtain estimators with tight concentration around the true parameter of interest taking values in a Banach space. Suggested construction relies on the fact that the geometric median of a collection of independent "weakly concentrated" estimators satisfies a much stronger deviation bound than each individual element in the collection. Our approach is illustrated through several examples, including sparse linear regression and low-rank matrix recovery problems.
• ### Active Clinical Trials for Personalized Medicine(1404.2971)

June 29, 2015 stat.ME
Individualized treatment rules (ITRs) tailor treatments according to individual patient characteristics. They can significantly improve patient care and are thus becoming increasingly popular. The data collected during randomized clinical trials are often used to estimate the optimal ITRs. However, these trials are generally expensive to run, and, moreover, they are not designed to efficiently estimate ITRs. In this paper, we propose a cost-effective estimation method from an active learning perspective. In particular, our method recruits only the "most informative" patients (in terms of learning the optimal ITRs) from an ongoing clinical trial. Simulation studies and real-data examples show that our active clinical trial method significantly improves on competing methods. We derive risk bounds and show that they support these observed empirical advantages.
• ### $L_1$-Penalization in Functional Linear Regression with Subgaussian Design(1307.8137)

Sept. 14, 2014 math.ST, stat.TH
We study functional regression with random subgaussian design and real-valued response. The focus is on the problems in which the regression function can be well approximated by a functional linear model with the slope function being "sparse" in the sense that it can be represented as a sum of a small number of well separated "spikes". This can be viewed as an extension of now classical sparse estimation problems to the case of infinite dictionaries. We study an estimator of the regression function based on penalized empirical risk minimization with quadratic loss and the complexity penalty defined in terms of $L_1$-norm (a continuous version of LASSO). The main goal is to introduce several important parameters characterizing sparsity in this class of problems and to prove sharp oracle inequalities showing how the $L_2$-error of the continuous LASSO estimator depends on the underlying sparsity of the problem.
• ### Plug-in Approach to Active Learning(1104.1450)

Nov. 2, 2011 math.ST, stat.TH, cs.LG
We present a new active learning algorithm based on nonparametric estimators of the regression function. Our investigation provides probabilistic bounds for the rates of convergence of the generalization error achievable by proposed method over a broad class of underlying distributions. We also prove minimax lower bounds which show that the obtained rates are almost tight.