
We study highdimensional distribution learning in an agnostic setting where
an adversary is allowed to arbitrarily corrupt an $\varepsilon$fraction of the
samples. Such questions have a rich history spanning statistics, machine
learning and theoretical computer science. Even in the most basic settings, the
only known approaches are either computationally inefficient or lose
dimensiondependent factors in their error guarantees. This raises the
following question:Is highdimensional agnostic distribution learning even
possible, algorithmically?
In this work, we obtain the first computationally efficient algorithms with
dimensionindependent error guarantees for agnostically learning several
fundamental classes of highdimensional distributions: (1) a single Gaussian,
(2) a product distribution on the hypercube, (3) mixtures of two product
distributions (under a natural balancedness condition), and (4) mixtures of
spherical Gaussians. Our algorithms achieve error that is independent of the
dimension, and in many cases scales nearlylinearly with the fraction of
adversarially corrupted samples. Moreover, we develop a general recipe for
detecting and correcting corruptions in highdimensions, that may be applicable
to many other problems.

We study the problem of testing identity against a given distribution with a
focus on the high confidence regime. More precisely, given samples from an
unknown distribution $p$ over $n$ elements, an explicitly given distribution
$q$, and parameters $0< \epsilon, \delta < 1$, we wish to distinguish, {\em
with probability at least $1\delta$}, whether the distributions are identical
versus $\varepsilon$far in total variation distance. Most prior work focused
on the case that $\delta = \Omega(1)$, for which the sample complexity of
identity testing is known to be $\Theta(\sqrt{n}/\epsilon^2)$. Given such an
algorithm, one can achieve arbitrarily small values of $\delta$ via blackbox
amplification, which multiplies the required number of samples by
$\Theta(\log(1/\delta))$.
We show that blackbox amplification is suboptimal for any $\delta = o(1)$,
and give a new identity tester that achieves the optimal sample complexity. Our
new upper and lower bounds show that the optimal sample complexity of identity
testing is \[
\Theta\left( \frac{1}{\epsilon^2}\left(\sqrt{n \log(1/\delta)} +
\log(1/\delta) \right)\right) \] for any $n, \varepsilon$, and $\delta$. For
the special case of uniformity testing, where the given distribution is the
uniform distribution $U_n$ over the domain, our new tester is surprisingly
simple: to test whether $p = U_n$ versus $d_{\mathrm TV}(p, U_n) \geq
\varepsilon$, we simply threshold $d_{\mathrm TV}(\widehat{p}, U_n)$, where
$\widehat{p}$ is the empirical probability distribution. The fact that this
simple "plugin" estimator is sampleoptimal is surprising, even in the
constant $\delta$ case. Indeed, it was believed that such a tester would not
attain sublinear sample complexity even for constant values of $\varepsilon$
and $\delta$.

We investigate the problem of learning Bayesian networks in a robust model
where an $\epsilon$fraction of the samples are adversarially corrupted. In
this work, we study the fully observable discrete case where the structure of
the network is given. Even in this basic setting, previous learning algorithms
either run in exponential time or lose dimensiondependent factors in their
error guarantees. We provide the first computationally efficient robust
learning algorithm for this problem with dimensionindependent error
guarantees. Our algorithm has nearoptimal sample complexity, runs in
polynomial time, and achieves error that scales nearlylinearly with the
fraction of adversarially corrupted samples. Finally, we show on both synthetic
and semisynthetic data that our algorithm performs well in practice.

We investigate the problem of identity testing for multidimensional histogram
distributions. A distribution $p: D \rightarrow \mathbb{R}_+$, where $D
\subseteq \mathbb{R}^d$, is called a {$k$histogram} if there exists a
partition of the domain into $k$ axisaligned rectangles such that $p$ is
constant within each such rectangle. Histograms are one of the most fundamental
nonparametric families of distributions and have been extensively studied in
computer science and statistics. We give the first identity tester for this
problem with {\em sublearning} sample complexity in any fixed dimension and a
nearlymatching sample complexity lower bound.
More specifically, let $q$ be an unknown $d$dimensional $k$histogram and
$p$ be an explicitly given $k$histogram. We want to correctly distinguish,
with probability at least $2/3$, between the case that $p = q$ versus
$\pq\_1 \geq \epsilon$. We design a computationally efficient algorithm for
this hypothesis testing problem with sample complexity $O((\sqrt{k}/\epsilon^2)
\log^{O(d)}(k/\epsilon))$. Our algorithm is robust to model misspecification,
i.e., succeeds even if $q$ is only promised to be {\em close} to a
$k$histogram. Moreover, for $k = 2^{\Omega(d)}$, we show a nearlymatching
sample complexity lower bound of $\Omega((\sqrt{k}/\epsilon^2)
(\log(k/\epsilon)/d)^{\Omega(d)})$ when $d\geq 2$.
Prior to our work, the sample complexity of the $d=1$ case was
wellunderstood, but no algorithm with sublearning sample complexity was
known, even for $d=2$. Our new upper and lower bounds have interesting
conceptual implications regarding the relation between learning and testing in
this setting.

Robust estimation is much more challenging in high dimensions than it is in
one dimension: Most techniques either lead to intractable optimization problems
or estimators that can tolerate only a tiny fraction of errors. Recent work in
theoretical computer science has shown that, in appropriate distributional
models, it is possible to robustly estimate the mean and covariance with
polynomial time algorithms that can tolerate a constant fraction of
corruptions, independent of the dimension. However, the sample and time
complexity of these algorithms is prohibitively large for highdimensional
applications. In this work, we address both of these issues by establishing
sample complexity bounds that are optimal, up to logarithmic factors, as well
as giving various refinements that allow the algorithms to tolerate a much
larger fraction of corruptions. Finally, we show on both synthetic and real
data that our algorithms have stateoftheart performance and suddenly make
highdimensional robust estimation a realistic possibility.

In high dimensions, most machine learning methods are brittle to even a small
fraction of structured outliers. To address this, we introduce a new
metaalgorithm that can take in a base learner such as least squares or
stochastic gradient descent, and harden the learner to be resistant to
outliers. Our method, Sever, possesses strong theoretical guarantees yet is
also highly scalable  beyond running the base learner itself, it only
requires computing the top singular vector of a certain $n \times d$ matrix. We
apply Sever on a drug design dataset and a spam classification dataset, and
find that in both cases it has substantially greater robustness than several
baselines. On the spam dataset, with $1\%$ corruptions, we achieved $7.4\%$
test error, compared to $13.4\%20.5\%$ for the baselines, and $3\%$ error on
the uncorrupted dataset. Similarly, on the drug design dataset, with $10\%$
corruptions, we achieved $1.42$ meansquared error test error, compared to
$1.51$$2.33$ for the baselines, and $1.23$ error on the uncorrupted dataset.

We study the problem of learning multivariate logconcave densities with
respect to a global loss function. We obtain the first upper bound on the
sample complexity of the maximum likelihood estimator (MLE) for a logconcave
density on $\mathbb{R}^d$, for all $d \geq 4$. Prior to this work, no finite
sample upper bound was known for this estimator in more than $3$ dimensions.
In more detail, we prove that for any $d \geq 1$ and $\epsilon>0$, given
$\tilde{O}_d((1/\epsilon)^{(d+3)/2})$ samples drawn from an unknown logconcave
density $f_0$ on $\mathbb{R}^d$, the MLE outputs a hypothesis $h$ that with
high probability is $\epsilon$close to $f_0$, in squared Hellinger loss. A
sample complexity lower bound of $\Omega_d((1/\epsilon)^{(d+1)/2})$ was
previously known for any learning algorithm that achieves this guarantee. We
thus establish that the sample complexity of the logconcave MLE is
nearoptimal, up to an $\tilde{O}(1/\epsilon)$ factor.

We study the problem of robustly learning multidimensional histograms. A
$d$dimensional function $h: D \rightarrow \mathbb{R}$ is called a
$k$histogram if there exists a partition of the domain $D \subseteq
\mathbb{R}^d$ into $k$ axisaligned rectangles such that $h$ is constant within
each such rectangle. Let $f: D \rightarrow \mathbb{R}$ be a $d$dimensional
probability density function and suppose that $f$ is $\mathrm{OPT}$close, in
$L_1$distance, to an unknown $k$histogram (with unknown partition). Our goal
is to output a hypothesis that is $O(\mathrm{OPT}) + \epsilon$ close to $f$, in
$L_1$distance. We give an algorithm for this learning problem that uses $n =
\tilde{O}_d(k/\epsilon^2)$ samples and runs in time $\tilde{O}_d(n)$. For any
fixed dimension, our algorithm has optimal sample complexity, up to logarithmic
factors, and runs in nearlinear time. Prior to our work, the time complexity
of the $d=1$ case was wellunderstood, but significant gaps in our
understanding remained even for $d=2$.

We study the problem of generalized uniformity testing \cite{BC17} of a
discrete probability distribution: Given samples from a probability
distribution $p$ over an {\em unknown} discrete domain $\mathbf{\Omega}$, we
want to distinguish, with probability at least $2/3$, between the case that $p$
is uniform on some {\em subset} of $\mathbf{\Omega}$ versus $\epsilon$far, in
total variation distance, from any such uniform distribution.
We establish tight bounds on the sample complexity of generalized uniformity
testing. In more detail, we present a computationally efficient tester whose
sample complexity is optimal, up to constant factors, and a matching
informationtheoretic lower bound. Specifically, we show that the sample
complexity of generalized uniformity testing is
$\Theta\left(1/(\epsilon^{4/3}\p\_3) + 1/(\epsilon^{2} \p\_2) \right)$.

We study the general problem of testing whether an unknown distribution
belongs to a specified family of distributions. More specifically, given a
distribution family $\mathcal{P}$ and sample access to an unknown discrete
distribution $\mathbf{P}$, we want to distinguish (with high probability)
between the case that $\mathbf{P} \in \mathcal{P}$ and the case that
$\mathbf{P}$ is $\epsilon$far, in total variation distance, from every
distribution in $\mathcal{P}$. This is the prototypical hypothesis testing
problem that has received significant attention in statistics and, more
recently, in theoretical computer science.
The sample complexity of this general inference task depends on the
underlying family $\mathcal{P}$. The gold standard in distribution property
testing is to design sampleoptimal and computationally efficient algorithms
for this task. The main contribution of this work is a simple and general
testing technique that is applicable to all distribution families whose Fourier
spectrum satisfies a certain approximate sparsity property. To the best of our
knowledge, ours is the first use of the Fourier transform in the context of
distribution testing.
We apply our Fourierbased framework to obtain near sampleoptimal and
computationally efficient testers for the following fundamental distribution
families: Sums of Independent Integer Random Variables (SIIRVs), Poisson
Multinomial Distributions (PMDs), and Discrete LogConcave Distributions. For
the first two, ours are the first nontrivial testers in the literature, vastly
generalizing previous work on testing Poisson Binomial Distributions. For the
third, our tester improves on prior work in both sample and time complexity.

We investigate the problems of identity and closeness testing over a discrete
population from random samples. Our goal is to develop efficient testers while
guaranteeing Differential Privacy to the individuals of the population. We
describe an approach that yields sampleefficient differentially private
testers for these problems. Our theoretical results show that there exist
private identity and closeness testers that are nearly as sampleefficient as
their nonprivate counterparts. We perform an experimental evaluation of our
algorithms on synthetic data. Our experiments illustrate that our private
testers achieve small type I and type II errors with sample size sublinear in
the domain size of the underlying distributions.

We study the efficient learnability of geometric concept classes 
specifically, lowdegree polynomial threshold functions (PTFs) and
intersections of halfspaces  when a fraction of the data is adversarially
corrupted. We give the first polynomialtime PAC learning algorithms for these
concept classes with dimensionindependent error guarantees in the presence of
nasty noise under the Gaussian distribution. In the nasty noise model, an
omniscient adversary can arbitrarily corrupt a small fraction of both the
unlabeled data points and their labels. This model generalizes wellstudied
noise models, including the malicious noise model and the agnostic (adversarial
label noise) model. Prior to our work, the only concept class for which
efficient malicious learning algorithms were known was the class of
origincentered halfspaces.
Specifically, our robust learning algorithm for lowdegree PTFs succeeds
under a number of tame distributions  including the Gaussian distribution
and, more generally, any logconcave distribution with (approximately) known
lowdegree moments. For LTFs under the Gaussian distribution, we give a
polynomialtime algorithm that achieves error $O(\epsilon)$, where $\epsilon$
is the noise rate. At the core of our PAC learning results is an efficient
algorithm to approximate the lowdegree Chowparameters of any bounded function
in the presence of nasty noise. To achieve this, we employ an iterative
spectral method for outlier detection and removal, inspired by recent work in
robust unsupervised learning. Our aforementioned algorithm succeeds for a range
of distributions satisfying mild concentration bounds and moment assumptions.
The correctness of our robust learning algorithm for intersections of
halfspaces makes essential use of a novel robust inverse independence lemma
that may be of broader interest.

We study the problem of estimating multivariate logconcave probability
density functions. We prove the first sample complexity upper bound for
learning logconcave densities on $\mathbb{R}^d$, for all $d \geq 1$. Prior to
our work, no upper bound on the sample complexity of this learning problem was
known for the case of $d>3$. In more detail, we give an estimator that, for any
$d \ge 1$ and $\epsilon>0$, draws $\tilde{O}_d \left( (1/\epsilon)^{(d+5)/2}
\right)$ samples from an unknown target logconcave density on $\mathbb{R}^d$,
and outputs a hypothesis that (with high probability) is $\epsilon$close to
the target, in total variation distance. Our upper bound on the sample
complexity comes close to the known lower bound of $\Omega_d \left(
(1/\epsilon)^{(d+1)/2} \right)$ for this problem.

We describe a general technique that yields the first {\em Statistical Query
lower bounds} for a range of fundamental highdimensional learning problems
involving Gaussian distributions. Our main results are for the problems of (1)
learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of
a single unknown Gaussian distribution. For each of these problems, we show a
{\em superpolynomial gap} between the (informationtheoretic) sample
complexity and the computational complexity of {\em any} Statistical Query
algorithm for the problem. Our SQ lower bound for Problem (1) is qualitatively
matched by known learning algorithms for GMMs. Our lower bound for Problem (2)
implies that the accuracy of the robust learning algorithm
in~\cite{DiakonikolasKKLMS16} is essentially best possible among all
polynomialtime SQ algorithms.
Our SQ lower bounds are attained via a unified momentmatching technique that
is useful in other contexts and may be of broader interest. Our technique
yields nearlytight lower bounds for a number of related unsupervised
estimation problems. Specifically, for the problems of (3) robust covariance
estimation in spectral norm, and (4) robust sparse mean estimation, we
establish a quadratic {\em statisticalcomputational tradeoff} for SQ
algorithms, matching known upper bounds. Finally, our technique can be used to
obtain tight sample complexity lower bounds for highdimensional {\em testing}
problems. Specifically, for the classical problem of robustly {\em testing} an
unknown mean (known covariance) Gaussian, our technique implies an
informationtheoretic sample lower bound that scales {\em linearly} in the
dimension. Our sample lower bound matches the sample complexity of the
corresponding robust {\em learning} problem and separates the sample complexity
of robust testing from standard (nonrobust) testing.

We study the fundamental problem of learning the parameters of a
highdimensional Gaussian in the presence of noise  where an
$\varepsilon$fraction of our samples were chosen by an adversary. We give
robust estimators that achieve estimation error $O(\varepsilon)$ in the total
variation distance, which is optimal up to a universal constant that is
independent of the dimension.
In the case where just the mean is unknown, our robustness guarantee is
optimal up to a factor of $\sqrt{2}$ and the running time is polynomial in $d$
and $1/\epsilon$. When both the mean and covariance are unknown, the running
time is polynomial in $d$ and quasipolynomial in $1/\varepsilon$. Moreover all
of our algorithms require only a polynomial number of samples. Our work shows
that the same sorts of error guarantees that were established over fifty years
ago in the onedimensional setting can also be achieved by efficient algorithms
in highdimensional settings.

We investigate the problem of testing the equivalence between two discrete
histograms. A {\em $k$histogram} over $[n]$ is a probability distribution that
is piecewise constant over some set of $k$ intervals over $[n]$. Histograms
have been extensively studied in computer science and statistics. Given a set
of samples from two $k$histogram distributions $p, q$ over $[n]$, we want to
distinguish (with high probability) between the cases that $p = q$ and
$\pq\_1 \geq \epsilon$. The main contribution of this paper is a new
algorithm for this testing problem and a nearly matching informationtheoretic
lower bound. Specifically, the sample complexity of our algorithm matches our
lower bound up to a logarithmic factor, improving on previous work by
polynomial factors in the relevant parameters. Our algorithmic approach applies
in a more general setting and yields improved sample upper bounds for testing
closeness of other structured distributions as well.

This work initiates a systematic investigation of testing {\em
highdimensional} structured distributions by focusing on testing {\em Bayesian
networks}  the prototypical family of directed graphical models. A Bayesian
network is defined by a directed acyclic graph, where we associate a random
variable with each node. The value at any particular node is conditionally
independent of all the other nondescendant nodes once its parents are fixed.
Specifically, we study the properties of identity testing and closeness testing
of Bayesian networks. Our main contribution is the first nontrivial efficient
testing algorithms for these problems and corresponding informationtheoretic
lower bounds. For a wide range of parameter settings, our testing algorithms
have sample complexity {\em sublinear} in the dimension and are sampleoptimal,
up to constant factors.

We study the fundamental problems of (i) uniformity testing of a discrete
distribution, and (ii) closeness testing between two discrete distributions
with bounded $\ell_2$norm. These problems have been extensively studied in
distribution testing and sampleoptimal estimators are known for
them~\cite{Paninski:08, CDVV14, VV14, DKN:15}.
In this work, we show that the original collisionbased testers proposed for
these problems ~\cite{GRdist:00, BFR+:00} are sampleoptimal, up to constant
factors. Previous analyses showed sample complexity upper bounds for these
testers that are optimal as a function of the domain size $n$, but suboptimal
by polynomial factors in the error parameter $\epsilon$. Our main contribution
is a new tight analysis establishing that these collisionbased testers are
informationtheoretically optimal, up to constant factors, both in the
dependence on $n$ and in the dependence on $\epsilon$.

In this paper we consider two special cases of the "coverbypairs"
optimization problem that arise when we need to place facilities so that each
customer is served by two facilities that reach it by disjoint shortest paths.
These problems arise in a network traffic monitoring scheme proposed by Breslau
et al. and have potential applications to content distribution. The
"setdisjoint" variant applies to networks that use the OSPF routing protocol,
and the "pathdisjoint" variant applies when MPLS routing is enabled, making
better solutions possible at the cost of greater operational expense. Although
we can prove that no polynomialtime algorithm can guarantee good solutions for
either version, we are able to provide heuristics that do very well in practice
on instances with realworld network structure. Fast implementations of the
heuristics, made possible by exploiting mathematical observations about the
relationship between the network instances and the corresponding instances of
the coverbypairs problem, allow us to perform an extensive experimental
evaluation of the heuristics and what the solutions they produce tell us about
the effectiveness of the proposed monitoring scheme. For the setdisjoint
variant, we validate our claim of nearoptimality via a new lowerbounding
integer programming formulation. Although computing this lower bound requires
solving the NPhard Hitting Set problem and can underestimate the optimal value
by a linear factor in the worst case, it can be computed quickly by CPLEX, and
it equals the optimal solution value for all the instances in our extensive
testbed.

We investigate the complexity of computing approximate Nash equilibria in
anonymous games. Our main algorithmic result is the following: For any
$n$player anonymous game with a bounded number of strategies and any constant
$\delta>0$, an $O(1/n^{1\delta})$approximate Nash equilibrium can be computed
in polynomial time. Complementing this positive result, we show that if there
exists any constant $\delta>0$ such that an $O(1/n^{1+\delta})$approximate
equilibrium can be computed in polynomial time, then there is a fully
polynomialtime approximation scheme for this problem.
We also present a faster algorithm that, for any $n$player $k$strategy
anonymous game, runs in time $\tilde O((n+k) k n^k)$ and computes an $\tilde
O(n^{1/3} k^{11/3})$approximate equilibrium. This algorithm follows from the
existence of simple approximate equilibria of anonymous games, where each
player plays one strategy with probability $1\delta$, for some small $\delta$,
and plays uniformly at random with probability $\delta$.
Our approach exploits the connection between Nash equilibria in anonymous
games and Poisson multinomial distributions (PMDs). Specifically, we prove a
new probabilistic lemma establishing the following: Two PMDs, with large
variance in each direction, whose first few moments are approximately matching
are close in total variation distance. Our structural result strengthens
previous work by providing a smooth tradeoff between the variance bound and the
number of matching moments.

We study the fixed design segmented regression problem: Given noisy samples
from a piecewise linear function $f$, we want to recover $f$ up to a desired
accuracy in meansquared error.
Previous rigorous approaches for this problem rely on dynamic programming
(DP) and, while sample efficient, have running time quadratic in the sample
size. As our main contribution, we provide new sample nearlinear time
algorithms for the problem that  while not being minimax optimal  achieve a
significantly better sampletime tradeoff on large datasets compared to the DP
approach. Our experimental evaluation shows that, compared with the DP
approach, our algorithms provide a convergence rate that is only off by a
factor of $2$ to $4$, while achieving speedups of three orders of magnitude.

An $(n, k)$Poisson Multinomial Distribution (PMD) is a random variable of
the form $X = \sum_{i=1}^n X_i$, where the $X_i$'s are independent random
vectors supported on the set of standard basis vectors in $\mathbb{R}^k.$ In
this paper, we obtain a refined structural understanding of PMDs by analyzing
their Fourier transform. As our core structural result, we prove that the
Fourier transform of PMDs is {\em approximately sparse}, i.e., roughly
speaking, its $L_1$norm is small outside a small set. By building on this
result, we obtain the following applications:
{\bf Learning Theory.} We design the first computationally efficient learning
algorithm for PMDs with respect to the total variation distance. Our algorithm
learns an arbitrary $(n, k)$PMD within variation distance $\epsilon$ using a
nearoptimal sample size of $\widetilde{O}_k(1/\epsilon^2),$ and runs in time
$\widetilde{O}_k(1/\epsilon^2) \cdot \log n.$ Previously, no algorithm with a
$\mathrm{poly}(1/\epsilon)$ runtime was known, even for $k=3.$
{\bf Game Theory.} We give the first efficient polynomialtime approximation
scheme (EPTAS) for computing Nash equilibria in anonymous games. For normalized
anonymous games with $n$ players and $k$ strategies, our algorithm computes a
wellsupported $\epsilon$Nash equilibrium in time $n^{O(k^3)} \cdot
(k/\epsilon)^{O(k^3\log(k/\epsilon)/\log\log(k/\epsilon))^{k1}}.$ The best
previous algorithm for this problem had running time $n^{(f(k)/\epsilon)^k},$
where $f(k) = \Omega(k^{k^2})$, for any $k>2.$
{\bf Statistics.} We prove a multivariate central limit theorem (CLT) that
relates an arbitrary PMD to a discretized multivariate Gaussian with the same
mean and covariance, in total variation distance. Our new CLT strengthens the
CLT of Valiant and Valiant by completely removing the dependence on $n$ in the
error bound.

We study the {\em robust proper learning} of univariate logconcave
distributions (over continuous and discrete domains). Given a set of samples
drawn from an unknown target distribution, we want to compute a logconcave
hypothesis distribution that is as close as possible to the target, in total
variation distance. In this work, we give the first computationally efficient
algorithm for this learning problem. Our algorithm achieves the
informationtheoretically optimal sample size (up to a constant factor), runs
in polynomial time, and is robust to model misspecification with nearlyoptimal
error guarantees.
Specifically, we give an algorithm that, on input $n=O(1/\eps^{5/2})$ samples
from an unknown distribution $f$, runs in time $\widetilde{O}(n^{8/5})$, and
outputs a logconcave hypothesis $h$ that (with high probability) satisfies
$\dtv(h, f) = O(\opt)+\eps$, where $\opt$ is the minimum total variation
distance between $f$ and the class of logconcave distributions. Our approach
to the robust proper learning problem is quite flexible and may be applicable
to many other univariate distribution families.

In this work, we give a novel general approach for distribution testing. We
describe two techniques: our first technique gives sampleoptimal testers,
while our second technique gives matching sample lower bounds. As a
consequence, we resolve the sample complexity of a wide variety of testing
problems.
Our upper bounds are obtained via a modular reductionbased approach. Our
approach yields optimal testers for numerous problems by using a standard
$\ell_2$identity tester as a blackbox. Using this recipe, we obtain simple
estimators for a wide range of problems, encompassing most problems previously
studied in the TCS literature, namely: (1) identity testing to a fixed
distribution, (2) closeness testing between two unknown distributions (with
equal/unequal sample sizes), (3) independence testing (in any number of
dimensions), (4) closeness testing for collections of distributions, and (5)
testing histograms. For all of these problems, our testers are sampleoptimal,
up to constant factors. With the exception of (1), ours are the {\em first
sampleoptimal testers for the corresponding problems.} Moreover, our
estimators are significantly simpler to state and analyze compared to previous
results.
As an application of our reductionbased technique, we obtain the first {\em
nearly instanceoptimal} algorithm for testing equivalence between two {\em
unknown} distributions. Moreover, our technique naturally generalizes to other
metrics beyond the $\ell_1$distance.
Our lower bounds are obtained via a direct informationtheoretic approach:
Given a candidate hard instance, our proof proceeds by bounding the mutual
information between appropriate random variables. While this is a classical
method in information theory, prior to our work, it had not been used in
distribution property testing.

We study the question of testing structured properties (classes) of discrete
distributions. Specifically, given sample access to an arbitrary distribution
$D$ over $[n]$ and a property $\mathcal{P}$, the goal is to distinguish between
$D\in\mathcal{P}$ and $\ell_1(D,\mathcal{P})>\varepsilon$. We develop a general
algorithm for this question, which applies to a large range of
"shapeconstrained" properties, including monotone, logconcave, $t$modal,
piecewisepolynomial, and Poisson Binomial distributions. Moreover, for all
cases considered, our algorithm has nearoptimal sample complexity with regard
to the domain size and is computationally efficient. For most of these classes,
we provide the first nontrivial tester in the literature. In addition, we also
describe a generic method to prove lower bounds for this problem, and use it to
show our upper bounds are nearly tight. Finally, we extend some of our
techniques to tolerant testing, deriving nearlytight upper and lower bounds
for the corresponding questions.