• ### Robust Estimators in High Dimensions without the Computational Intractability(1604.06443)

March 15, 2019 cs.IT, math.IT, math.ST, stat.TH, cs.DS, cs.LG, stat.ML
We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an $\varepsilon$-fraction of the samples. Such questions have a rich history spanning statistics, machine learning and theoretical computer science. Even in the most basic settings, the only known approaches are either computationally inefficient or lose dimension-dependent factors in their error guarantees. This raises the following question:Is high-dimensional agnostic distribution learning even possible, algorithmically? In this work, we obtain the first computationally efficient algorithms with dimension-independent error guarantees for agnostically learning several fundamental classes of high-dimensional distributions: (1) a single Gaussian, (2) a product distribution on the hypercube, (3) mixtures of two product distributions (under a natural balancedness condition), and (4) mixtures of spherical Gaussians. Our algorithms achieve error that is independent of the dimension, and in many cases scales nearly-linearly with the fraction of adversarially corrupted samples. Moreover, we develop a general recipe for detecting and correcting corruptions in high-dimensions, that may be applicable to many other problems.
• ### Optimal Identity Testing with High Probability(1708.02728)

Jan. 15, 2019 cs.IT, math.IT, math.ST, stat.TH, cs.DS, cs.LG
We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution $p$ over $n$ elements, an explicitly given distribution $q$, and parameters $0< \epsilon, \delta < 1$, we wish to distinguish, {\em with probability at least $1-\delta$}, whether the distributions are identical versus $\varepsilon$-far in total variation distance. Most prior work focused on the case that $\delta = \Omega(1)$, for which the sample complexity of identity testing is known to be $\Theta(\sqrt{n}/\epsilon^2)$. Given such an algorithm, one can achieve arbitrarily small values of $\delta$ via black-box amplification, which multiplies the required number of samples by $\Theta(\log(1/\delta))$. We show that black-box amplification is suboptimal for any $\delta = o(1)$, and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is $\Theta\left( \frac{1}{\epsilon^2}\left(\sqrt{n \log(1/\delta)} + \log(1/\delta) \right)\right)$ for any $n, \varepsilon$, and $\delta$. For the special case of uniformity testing, where the given distribution is the uniform distribution $U_n$ over the domain, our new tester is surprisingly simple: to test whether $p = U_n$ versus $d_{\mathrm TV}(p, U_n) \geq \varepsilon$, we simply threshold $d_{\mathrm TV}(\widehat{p}, U_n)$, where $\widehat{p}$ is the empirical probability distribution. The fact that this simple "plug-in" estimator is sample-optimal is surprising, even in the constant $\delta$ case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of $\varepsilon$ and $\delta$.
• ### Robust Learning of Fixed-Structure Bayesian Networks(1606.07384)

Oct. 29, 2018 cs.AI, math.ST, stat.TH, cs.DS, cs.LG
We investigate the problem of learning Bayesian networks in a robust model where an $\epsilon$-fraction of the samples are adversarially corrupted. In this work, we study the fully observable discrete case where the structure of the network is given. Even in this basic setting, previous learning algorithms either run in exponential time or lose dimension-dependent factors in their error guarantees. We provide the first computationally efficient robust learning algorithm for this problem with dimension-independent error guarantees. Our algorithm has near-optimal sample complexity, runs in polynomial time, and achieves error that scales nearly-linearly with the fraction of adversarially corrupted samples. Finally, we show on both synthetic and semi-synthetic data that our algorithm performs well in practice.
• ### Testing Identity of Multidimensional Histograms(1804.03636)

April 10, 2018 cs.IT, math.IT, math.ST, stat.TH, cs.DS, cs.LG
We investigate the problem of identity testing for multidimensional histogram distributions. A distribution $p: D \rightarrow \mathbb{R}_+$, where $D \subseteq \mathbb{R}^d$, is called a {$k$-histogram} if there exists a partition of the domain into $k$ axis-aligned rectangles such that $p$ is constant within each such rectangle. Histograms are one of the most fundamental non-parametric families of distributions and have been extensively studied in computer science and statistics. We give the first identity tester for this problem with {\em sub-learning} sample complexity in any fixed dimension and a nearly-matching sample complexity lower bound. More specifically, let $q$ be an unknown $d$-dimensional $k$-histogram and $p$ be an explicitly given $k$-histogram. We want to correctly distinguish, with probability at least $2/3$, between the case that $p = q$ versus $\|p-q\|_1 \geq \epsilon$. We design a computationally efficient algorithm for this hypothesis testing problem with sample complexity $O((\sqrt{k}/\epsilon^2) \log^{O(d)}(k/\epsilon))$. Our algorithm is robust to model misspecification, i.e., succeeds even if $q$ is only promised to be {\em close} to a $k$-histogram. Moreover, for $k = 2^{\Omega(d)}$, we show a nearly-matching sample complexity lower bound of $\Omega((\sqrt{k}/\epsilon^2) (\log(k/\epsilon)/d)^{\Omega(d)})$ when $d\geq 2$. Prior to our work, the sample complexity of the $d=1$ case was well-understood, but no algorithm with sub-learning sample complexity was known, even for $d=2$. Our new upper and lower bounds have interesting conceptual implications regarding the relation between learning and testing in this setting.
• ### Being Robust (in High Dimensions) Can Be Practical(1703.00893)

March 13, 2018 cs.IT, math.IT, cs.DS, cs.LG, stat.ML
Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.
• ### Sever: A Robust Meta-Algorithm for Stochastic Optimization(1803.02815)

March 7, 2018 cs.AI, cs.DS, cs.LG, stat.ML
In high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers. To address this, we introduce a new meta-algorithm that can take in a base learner such as least squares or stochastic gradient descent, and harden the learner to be resistant to outliers. Our method, Sever, possesses strong theoretical guarantees yet is also highly scalable -- beyond running the base learner itself, it only requires computing the top singular vector of a certain $n \times d$ matrix. We apply Sever on a drug design dataset and a spam classification dataset, and find that in both cases it has substantially greater robustness than several baselines. On the spam dataset, with $1\%$ corruptions, we achieved $7.4\%$ test error, compared to $13.4\%-20.5\%$ for the baselines, and $3\%$ error on the uncorrupted dataset. Similarly, on the drug design dataset, with $10\%$ corruptions, we achieved $1.42$ mean-squared error test error, compared to $1.51$-$2.33$ for the baselines, and $1.23$ error on the uncorrupted dataset.
• ### Near-Optimal Sample Complexity Bounds for Maximum Likelihood Estimation of Multivariate Log-concave Densities(1802.10575)

Feb. 28, 2018 cs.IT, math.IT, math.ST, stat.TH, cs.LG
We study the problem of learning multivariate log-concave densities with respect to a global loss function. We obtain the first upper bound on the sample complexity of the maximum likelihood estimator (MLE) for a log-concave density on $\mathbb{R}^d$, for all $d \geq 4$. Prior to this work, no finite sample upper bound was known for this estimator in more than $3$ dimensions. In more detail, we prove that for any $d \geq 1$ and $\epsilon>0$, given $\tilde{O}_d((1/\epsilon)^{(d+3)/2})$ samples drawn from an unknown log-concave density $f_0$ on $\mathbb{R}^d$, the MLE outputs a hypothesis $h$ that with high probability is $\epsilon$-close to $f_0$, in squared Hellinger loss. A sample complexity lower bound of $\Omega_d((1/\epsilon)^{(d+1)/2})$ was previously known for any learning algorithm that achieves this guarantee. We thus establish that the sample complexity of the log-concave MLE is near-optimal, up to an $\tilde{O}(1/\epsilon)$ factor.
• ### Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms(1802.08513)

Feb. 23, 2018 math.ST, stat.TH, cs.DS, cs.LG
We study the problem of robustly learning multi-dimensional histograms. A $d$-dimensional function $h: D \rightarrow \mathbb{R}$ is called a $k$-histogram if there exists a partition of the domain $D \subseteq \mathbb{R}^d$ into $k$ axis-aligned rectangles such that $h$ is constant within each such rectangle. Let $f: D \rightarrow \mathbb{R}$ be a $d$-dimensional probability density function and suppose that $f$ is $\mathrm{OPT}$-close, in $L_1$-distance, to an unknown $k$-histogram (with unknown partition). Our goal is to output a hypothesis that is $O(\mathrm{OPT}) + \epsilon$ close to $f$, in $L_1$-distance. We give an algorithm for this learning problem that uses $n = \tilde{O}_d(k/\epsilon^2)$ samples and runs in time $\tilde{O}_d(n)$. For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in near-linear time. Prior to our work, the time complexity of the $d=1$ case was well-understood, but significant gaps in our understanding remained even for $d=2$.
• ### Sharp Bounds for Generalized Uniformity Testing(1709.02087)

Sept. 7, 2017 cs.IT, math.IT, math.ST, stat.TH, cs.DS, cs.LG
We study the problem of generalized uniformity testing \cite{BC17} of a discrete probability distribution: Given samples from a probability distribution $p$ over an {\em unknown} discrete domain $\mathbf{\Omega}$, we want to distinguish, with probability at least $2/3$, between the case that $p$ is uniform on some {\em subset} of $\mathbf{\Omega}$ versus $\epsilon$-far, in total variation distance, from any such uniform distribution. We establish tight bounds on the sample complexity of generalized uniformity testing. In more detail, we present a computationally efficient tester whose sample complexity is optimal, up to constant factors, and a matching information-theoretic lower bound. Specifically, we show that the sample complexity of generalized uniformity testing is $\Theta\left(1/(\epsilon^{4/3}\|p\|_3) + 1/(\epsilon^{2} \|p\|_2) \right)$.
• ### Fourier-Based Testing for Families of Distributions(1706.05738)

Aug. 8, 2017 math.PR, cs.DM, math.ST, stat.TH, cs.DS, cs.CC
We study the general problem of testing whether an unknown distribution belongs to a specified family of distributions. More specifically, given a distribution family $\mathcal{P}$ and sample access to an unknown discrete distribution $\mathbf{P}$, we want to distinguish (with high probability) between the case that $\mathbf{P} \in \mathcal{P}$ and the case that $\mathbf{P}$ is $\epsilon$-far, in total variation distance, from every distribution in $\mathcal{P}$. This is the prototypical hypothesis testing problem that has received significant attention in statistics and, more recently, in theoretical computer science. The sample complexity of this general inference task depends on the underlying family $\mathcal{P}$. The gold standard in distribution property testing is to design sample-optimal and computationally efficient algorithms for this task. The main contribution of this work is a simple and general testing technique that is applicable to all distribution families whose Fourier spectrum satisfies a certain approximate sparsity property. To the best of our knowledge, ours is the first use of the Fourier transform in the context of distribution testing. We apply our Fourier-based framework to obtain near sample-optimal and computationally efficient testers for the following fundamental distribution families: Sums of Independent Integer Random Variables (SIIRVs), Poisson Multinomial Distributions (PMDs), and Discrete Log-Concave Distributions. For the first two, ours are the first non-trivial testers in the literature, vastly generalizing previous work on testing Poisson Binomial Distributions. For the third, our tester improves on prior work in both sample and time complexity.
• ### Differentially Private Identity and Closeness Testing of Discrete Distributions(1707.05497)

July 18, 2017 cs.IT, math.IT, cs.DS, cs.LG, stat.ML
We investigate the problems of identity and closeness testing over a discrete population from random samples. Our goal is to develop efficient testers while guaranteeing Differential Privacy to the individuals of the population. We describe an approach that yields sample-efficient differentially private testers for these problems. Our theoretical results show that there exist private identity and closeness testers that are nearly as sample-efficient as their non-private counterparts. We perform an experimental evaluation of our algorithms on synthetic data. Our experiments illustrate that our private testers achieve small type I and type II errors with sample size sublinear in the domain size of the underlying distributions.
• ### Learning Geometric Concepts with Nasty Noise(1707.01242)

July 5, 2017 cs.DS, cs.CC, cs.LG
We study the efficient learnability of geometric concept classes - specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces - when a fraction of the data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. Specifically, our robust learning algorithm for low-degree PTFs succeeds under a number of tame distributions -- including the Gaussian distribution and, more generally, any log-concave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, we give a polynomial-time algorithm that achieves error $O(\epsilon)$, where $\epsilon$ is the noise rate. At the core of our PAC learning results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. To achieve this, we employ an iterative spectral method for outlier detection and removal, inspired by recent work in robust unsupervised learning. Our aforementioned algorithm succeeds for a range of distributions satisfying mild concentration bounds and moment assumptions. The correctness of our robust learning algorithm for intersections of halfspaces makes essential use of a novel robust inverse independence lemma that may be of broader interest.
• ### Learning Multivariate Log-concave Distributions(1605.08188)

June 5, 2017 cs.IT, math.IT, math.ST, stat.TH, cs.LG
We study the problem of estimating multivariate log-concave probability density functions. We prove the first sample complexity upper bound for learning log-concave densities on $\mathbb{R}^d$, for all $d \geq 1$. Prior to our work, no upper bound on the sample complexity of this learning problem was known for the case of $d>3$. In more detail, we give an estimator that, for any $d \ge 1$ and $\epsilon>0$, draws $\tilde{O}_d \left( (1/\epsilon)^{(d+5)/2} \right)$ samples from an unknown target log-concave density on $\mathbb{R}^d$, and outputs a hypothesis that (with high probability) is $\epsilon$-close to the target, in total variation distance. Our upper bound on the sample complexity comes close to the known lower bound of $\Omega_d \left( (1/\epsilon)^{(d+1)/2} \right)$ for this problem.
• ### Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures(1611.03473)

We describe a general technique that yields the first {\em Statistical Query lower bounds} for a range of fundamental high-dimensional learning problems involving Gaussian distributions. Our main results are for the problems of (1) learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of a single unknown Gaussian distribution. For each of these problems, we show a {\em super-polynomial gap} between the (information-theoretic) sample complexity and the computational complexity of {\em any} Statistical Query algorithm for the problem. Our SQ lower bound for Problem (1) is qualitatively matched by known learning algorithms for GMMs. Our lower bound for Problem (2) implies that the accuracy of the robust learning algorithm in~\cite{DiakonikolasKKLMS16} is essentially best possible among all polynomial-time SQ algorithms. Our SQ lower bounds are attained via a unified moment-matching technique that is useful in other contexts and may be of broader interest. Our technique yields nearly-tight lower bounds for a number of related unsupervised estimation problems. Specifically, for the problems of (3) robust covariance estimation in spectral norm, and (4) robust sparse mean estimation, we establish a quadratic {\em statistical--computational tradeoff} for SQ algorithms, matching known upper bounds. Finally, our technique can be used to obtain tight sample complexity lower bounds for high-dimensional {\em testing} problems. Specifically, for the classical problem of robustly {\em testing} an unknown mean (known covariance) Gaussian, our technique implies an information-theoretic sample lower bound that scales {\em linearly} in the dimension. Our sample lower bound matches the sample complexity of the corresponding robust {\em learning} problem and separates the sample complexity of robust testing from standard (non-robust) testing.
• ### Robustly Learning a Gaussian: Getting Optimal Error, Efficiently(1704.03866)

April 12, 2017 cs.IT, math.IT, math.ST, stat.TH, cs.DS, cs.LG, stat.ML
We study the fundamental problem of learning the parameters of a high-dimensional Gaussian in the presence of noise -- where an $\varepsilon$-fraction of our samples were chosen by an adversary. We give robust estimators that achieve estimation error $O(\varepsilon)$ in the total variation distance, which is optimal up to a universal constant that is independent of the dimension. In the case where just the mean is unknown, our robustness guarantee is optimal up to a factor of $\sqrt{2}$ and the running time is polynomial in $d$ and $1/\epsilon$. When both the mean and covariance are unknown, the running time is polynomial in $d$ and quasipolynomial in $1/\varepsilon$. Moreover all of our algorithms require only a polynomial number of samples. Our work shows that the same sorts of error guarantees that were established over fifty years ago in the one-dimensional setting can also be achieved by efficient algorithms in high-dimensional settings.
• ### Near-Optimal Closeness Testing of Discrete Histogram Distributions(1703.01913)

March 6, 2017 cs.IT, math.IT, math.ST, stat.TH, cs.DS, cs.LG
We investigate the problem of testing the equivalence between two discrete histograms. A {\em $k$-histogram} over $[n]$ is a probability distribution that is piecewise constant over some set of $k$ intervals over $[n]$. Histograms have been extensively studied in computer science and statistics. Given a set of samples from two $k$-histogram distributions $p, q$ over $[n]$, we want to distinguish (with high probability) between the cases that $p = q$ and $\|p-q\|_1 \geq \epsilon$. The main contribution of this paper is a new algorithm for this testing problem and a nearly matching information-theoretic lower bound. Specifically, the sample complexity of our algorithm matches our lower bound up to a logarithmic factor, improving on previous work by polynomial factors in the relevant parameters. Our algorithmic approach applies in a more general setting and yields improved sample upper bounds for testing closeness of other structured distributions as well.
• ### Testing Bayesian Networks(1612.03156)

Dec. 9, 2016 cs.IT, math.IT, math.ST, stat.TH, cs.DS, cs.LG
This work initiates a systematic investigation of testing {\em high-dimensional} structured distributions by focusing on testing {\em Bayesian networks} -- the prototypical family of directed graphical models. A Bayesian network is defined by a directed acyclic graph, where we associate a random variable with each node. The value at any particular node is conditionally independent of all the other non-descendant nodes once its parents are fixed. Specifically, we study the properties of identity testing and closeness testing of Bayesian networks. Our main contribution is the first non-trivial efficient testing algorithms for these problems and corresponding information-theoretic lower bounds. For a wide range of parameter settings, our testing algorithms have sample complexity {\em sublinear} in the dimension and are sample-optimal, up to constant factors.
• ### Collision-based Testers are Optimal for Uniformity and Closeness(1611.03579)

Nov. 11, 2016 cs.IT, math.IT, math.ST, stat.TH, cs.DS, cs.LG
We study the fundamental problems of (i) uniformity testing of a discrete distribution, and (ii) closeness testing between two discrete distributions with bounded $\ell_2$-norm. These problems have been extensively studied in distribution testing and sample-optimal estimators are known for them~\cite{Paninski:08, CDVV14, VV14, DKN:15}. In this work, we show that the original collision-based testers proposed for these problems ~\cite{GRdist:00, BFR+:00} are sample-optimal, up to constant factors. Previous analyses showed sample complexity upper bounds for these testers that are optimal as a function of the domain size $n$, but suboptimal by polynomial factors in the error parameter $\epsilon$. Our main contribution is a new tight analysis establishing that these collision-based testers are information-theoretically optimal, up to constant factors, both in the dependence on $n$ and in the dependence on $\epsilon$.
• ### Near-Optimal Disjoint-Path Facility Location Through Set Cover by Pairs(1611.01210)

Nov. 3, 2016 cs.DS
In this paper we consider two special cases of the "cover-by-pairs" optimization problem that arise when we need to place facilities so that each customer is served by two facilities that reach it by disjoint shortest paths. These problems arise in a network traffic monitoring scheme proposed by Breslau et al. and have potential applications to content distribution. The "set-disjoint" variant applies to networks that use the OSPF routing protocol, and the "path-disjoint" variant applies when MPLS routing is enabled, making better solutions possible at the cost of greater operational expense. Although we can prove that no polynomial-time algorithm can guarantee good solutions for either version, we are able to provide heuristics that do very well in practice on instances with real-world network structure. Fast implementations of the heuristics, made possible by exploiting mathematical observations about the relationship between the network instances and the corresponding instances of the cover-by-pairs problem, allow us to perform an extensive experimental evaluation of the heuristics and what the solutions they produce tell us about the effectiveness of the proposed monitoring scheme. For the set-disjoint variant, we validate our claim of near-optimality via a new lower-bounding integer programming formulation. Although computing this lower bound requires solving the NP-hard Hitting Set problem and can underestimate the optimal value by a linear factor in the worst case, it can be computed quickly by CPLEX, and it equals the optimal solution value for all the instances in our extensive testbed.
• ### Playing Anonymous Games using Simple Strategies(1608.07336)

Aug. 25, 2016 math.PR, cs.GT, cs.DS
We investigate the complexity of computing approximate Nash equilibria in anonymous games. Our main algorithmic result is the following: For any $n$-player anonymous game with a bounded number of strategies and any constant $\delta>0$, an $O(1/n^{1-\delta})$-approximate Nash equilibrium can be computed in polynomial time. Complementing this positive result, we show that if there exists any constant $\delta>0$ such that an $O(1/n^{1+\delta})$-approximate equilibrium can be computed in polynomial time, then there is a fully polynomial-time approximation scheme for this problem. We also present a faster algorithm that, for any $n$-player $k$-strategy anonymous game, runs in time $\tilde O((n+k) k n^k)$ and computes an $\tilde O(n^{-1/3} k^{11/3})$-approximate equilibrium. This algorithm follows from the existence of simple approximate equilibria of anonymous games, where each player plays one strategy with probability $1-\delta$, for some small $\delta$, and plays uniformly at random with probability $\delta$. Our approach exploits the connection between Nash equilibria in anonymous games and Poisson multinomial distributions (PMDs). Specifically, we prove a new probabilistic lemma establishing the following: Two PMDs, with large variance in each direction, whose first few moments are approximately matching are close in total variation distance. Our structural result strengthens previous work by providing a smooth tradeoff between the variance bound and the number of matching moments.
• ### Fast Algorithms for Segmented Regression(1607.03990)

July 14, 2016 math.ST, stat.TH, cs.DS, cs.LG
We study the fixed design segmented regression problem: Given noisy samples from a piecewise linear function $f$, we want to recover $f$ up to a desired accuracy in mean-squared error. Previous rigorous approaches for this problem rely on dynamic programming (DP) and, while sample efficient, have running time quadratic in the sample size. As our main contribution, we provide new sample near-linear time algorithms for the problem that -- while not being minimax optimal -- achieve a significantly better sample-time tradeoff on large datasets compared to the DP approach. Our experimental evaluation shows that, compared with the DP approach, our algorithms provide a convergence rate that is only off by a factor of $2$ to $4$, while achieving speedups of three orders of magnitude.
• ### The Fourier Transform of Poisson Multinomial Distributions and its Algorithmic Applications(1511.03592)

June 22, 2016 math.PR, math.ST, stat.TH, cs.GT, cs.DS, cs.LG
An $(n, k)$-Poisson Multinomial Distribution (PMD) is a random variable of the form $X = \sum_{i=1}^n X_i$, where the $X_i$'s are independent random vectors supported on the set of standard basis vectors in $\mathbb{R}^k.$ In this paper, we obtain a refined structural understanding of PMDs by analyzing their Fourier transform. As our core structural result, we prove that the Fourier transform of PMDs is {\em approximately sparse}, i.e., roughly speaking, its $L_1$-norm is small outside a small set. By building on this result, we obtain the following applications: {\bf Learning Theory.} We design the first computationally efficient learning algorithm for PMDs with respect to the total variation distance. Our algorithm learns an arbitrary $(n, k)$-PMD within variation distance $\epsilon$ using a near-optimal sample size of $\widetilde{O}_k(1/\epsilon^2),$ and runs in time $\widetilde{O}_k(1/\epsilon^2) \cdot \log n.$ Previously, no algorithm with a $\mathrm{poly}(1/\epsilon)$ runtime was known, even for $k=3.$ {\bf Game Theory.} We give the first efficient polynomial-time approximation scheme (EPTAS) for computing Nash equilibria in anonymous games. For normalized anonymous games with $n$ players and $k$ strategies, our algorithm computes a well-supported $\epsilon$-Nash equilibrium in time $n^{O(k^3)} \cdot (k/\epsilon)^{O(k^3\log(k/\epsilon)/\log\log(k/\epsilon))^{k-1}}.$ The best previous algorithm for this problem had running time $n^{(f(k)/\epsilon)^k},$ where $f(k) = \Omega(k^{k^2})$, for any $k>2.$ {\bf Statistics.} We prove a multivariate central limit theorem (CLT) that relates an arbitrary PMD to a discretized multivariate Gaussian with the same mean and covariance, in total variation distance. Our new CLT strengthens the CLT of Valiant and Valiant by completely removing the dependence on $n$ in the error bound.
• ### Efficient Robust Proper Learning of Log-concave Distributions(1606.03077)

June 9, 2016 math.ST, stat.TH, cs.DS, cs.LG
We study the {\em robust proper learning} of univariate log-concave distributions (over continuous and discrete domains). Given a set of samples drawn from an unknown target distribution, we want to compute a log-concave hypothesis distribution that is as close as possible to the target, in total variation distance. In this work, we give the first computationally efficient algorithm for this learning problem. Our algorithm achieves the information-theoretically optimal sample size (up to a constant factor), runs in polynomial time, and is robust to model misspecification with nearly-optimal error guarantees. Specifically, we give an algorithm that, on input $n=O(1/\eps^{5/2})$ samples from an unknown distribution $f$, runs in time $\widetilde{O}(n^{8/5})$, and outputs a log-concave hypothesis $h$ that (with high probability) satisfies $\dtv(h, f) = O(\opt)+\eps$, where $\opt$ is the minimum total variation distance between $f$ and the class of log-concave distributions. Our approach to the robust proper learning problem is quite flexible and may be applicable to many other univariate distribution families.
• ### A New Approach for Testing Properties of Discrete Distributions(1601.05557)

May 9, 2016 cs.IT, math.IT, math.ST, stat.TH, cs.DS
In this work, we give a novel general approach for distribution testing. We describe two techniques: our first technique gives sample-optimal testers, while our second technique gives matching sample lower bounds. As a consequence, we resolve the sample complexity of a wide variety of testing problems. Our upper bounds are obtained via a modular reduction-based approach. Our approach yields optimal testers for numerous problems by using a standard $\ell_2$-identity tester as a black-box. Using this recipe, we obtain simple estimators for a wide range of problems, encompassing most problems previously studied in the TCS literature, namely: (1) identity testing to a fixed distribution, (2) closeness testing between two unknown distributions (with equal/unequal sample sizes), (3) independence testing (in any number of dimensions), (4) closeness testing for collections of distributions, and (5) testing histograms. For all of these problems, our testers are sample-optimal, up to constant factors. With the exception of (1), ours are the {\em first sample-optimal testers for the corresponding problems.} Moreover, our estimators are significantly simpler to state and analyze compared to previous results. As an application of our reduction-based technique, we obtain the first {\em nearly instance-optimal} algorithm for testing equivalence between two {\em unknown} distributions. Moreover, our technique naturally generalizes to other metrics beyond the $\ell_1$-distance. Our lower bounds are obtained via a direct information-theoretic approach: Given a candidate hard instance, our proof proceeds by bounding the mutual information between appropriate random variables. While this is a classical method in information theory, prior to our work, it had not been used in distribution property testing.
• ### Testing Shape Restrictions of Discrete Distributions(1507.03558)

Jan. 21, 2016 math.PR, math.ST, stat.TH, cs.DS, cs.CC
We study the question of testing structured properties (classes) of discrete distributions. Specifically, given sample access to an arbitrary distribution $D$ over $[n]$ and a property $\mathcal{P}$, the goal is to distinguish between $D\in\mathcal{P}$ and $\ell_1(D,\mathcal{P})>\varepsilon$. We develop a general algorithm for this question, which applies to a large range of "shape-constrained" properties, including monotone, log-concave, $t$-modal, piecewise-polynomial, and Poisson Binomial distributions. Moreover, for all cases considered, our algorithm has near-optimal sample complexity with regard to the domain size and is computationally efficient. For most of these classes, we provide the first non-trivial tester in the literature. In addition, we also describe a generic method to prove lower bounds for this problem, and use it to show our upper bounds are nearly tight. Finally, we extend some of our techniques to tolerant testing, deriving nearly-tight upper and lower bounds for the corresponding questions.