
This work studies distributed learning in the spirit of Yao's model of
communication complexity: consider a twoparty setting, where each of the
players gets a list of labelled examples and they communicate in order to
jointly perform some learning task. To naturally fit into the framework of
learning theory, the players can send each other examples (as well as bits)
where each example/bit costs one unit of communication. This enables a uniform
treatment of infinite classes such as halfspaces in $\mathbb{R}^d$, which are
ubiquitous in machine learning.
We study several fundamental questions in this model. For example, we provide
combinatorial characterizations of the classes that can be learned with
efficient communication in the propercase as well as in the impropercase.
These findings imply unconditional separations between various learning
contexts, e.g.\ realizable versus agnostic learning, proper versus improper
learning, etc.
The derivation of these results hinges on a type of decision problems we term
"{\it realizability problems}" where the goal is deciding whether a distributed
input sample is consistent with an hypothesis from a prespecified class.
From a technical perspective, the protocols we use are based on ideas from
machine learning theory and the impossibility results are based on ideas from
communication complexity theory.

We investigate the problem of identity testing for multidimensional histogram
distributions. A distribution $p: D \rightarrow \mathbb{R}_+$, where $D
\subseteq \mathbb{R}^d$, is called a {$k$histogram} if there exists a
partition of the domain into $k$ axisaligned rectangles such that $p$ is
constant within each such rectangle. Histograms are one of the most fundamental
nonparametric families of distributions and have been extensively studied in
computer science and statistics. We give the first identity tester for this
problem with {\em sublearning} sample complexity in any fixed dimension and a
nearlymatching sample complexity lower bound.
More specifically, let $q$ be an unknown $d$dimensional $k$histogram and
$p$ be an explicitly given $k$histogram. We want to correctly distinguish,
with probability at least $2/3$, between the case that $p = q$ versus
$\pq\_1 \geq \epsilon$. We design a computationally efficient algorithm for
this hypothesis testing problem with sample complexity $O((\sqrt{k}/\epsilon^2)
\log^{O(d)}(k/\epsilon))$. Our algorithm is robust to model misspecification,
i.e., succeeds even if $q$ is only promised to be {\em close} to a
$k$histogram. Moreover, for $k = 2^{\Omega(d)}$, we show a nearlymatching
sample complexity lower bound of $\Omega((\sqrt{k}/\epsilon^2)
(\log(k/\epsilon)/d)^{\Omega(d)})$ when $d\geq 2$.
Prior to our work, the sample complexity of the $d=1$ case was
wellunderstood, but no algorithm with sublearning sample complexity was
known, even for $d=2$. Our new upper and lower bounds have interesting
conceptual implications regarding the relation between learning and testing in
this setting.

Robust estimation is much more challenging in high dimensions than it is in
one dimension: Most techniques either lead to intractable optimization problems
or estimators that can tolerate only a tiny fraction of errors. Recent work in
theoretical computer science has shown that, in appropriate distributional
models, it is possible to robustly estimate the mean and covariance with
polynomial time algorithms that can tolerate a constant fraction of
corruptions, independent of the dimension. However, the sample and time
complexity of these algorithms is prohibitively large for highdimensional
applications. In this work, we address both of these issues by establishing
sample complexity bounds that are optimal, up to logarithmic factors, as well
as giving various refinements that allow the algorithms to tolerate a much
larger fraction of corruptions. Finally, we show on both synthetic and real
data that our algorithms have stateoftheart performance and suddenly make
highdimensional robust estimation a realistic possibility.

In high dimensions, most machine learning methods are brittle to even a small
fraction of structured outliers. To address this, we introduce a new
metaalgorithm that can take in a base learner such as least squares or
stochastic gradient descent, and harden the learner to be resistant to
outliers. Our method, Sever, possesses strong theoretical guarantees yet is
also highly scalable  beyond running the base learner itself, it only
requires computing the top singular vector of a certain $n \times d$ matrix. We
apply Sever on a drug design dataset and a spam classification dataset, and
find that in both cases it has substantially greater robustness than several
baselines. On the spam dataset, with $1\%$ corruptions, we achieved $7.4\%$
test error, compared to $13.4\%20.5\%$ for the baselines, and $3\%$ error on
the uncorrupted dataset. Similarly, on the drug design dataset, with $10\%$
corruptions, we achieved $1.42$ meansquared error test error, compared to
$1.51$$2.33$ for the baselines, and $1.23$ error on the uncorrupted dataset.

We present an algebraic method to study fourdimensional toric varieties by
lifting matrix equations from the special linear group ${\rm SL}_2({\mathbb
Z})$ to its preimage in the universal cover of ${\rm SL}_2({\mathbb R})$. With
this method we recover the classification of twodimensional toric fans, and
obtain a description of their semitoric analogue. As an application to
symplectic geometry of Hamiltonian systems, we give a concise proof of the
connectivity of the moduli space of toric integrable systems in dimension four,
recovering a known result, and extend it to the case of semitoric integrable
systems with a fixed number of focusfocus points and which are in the same
twisting index class. In particular, we show that any semitoric system with
precisely one focusfocus singular point can be continuously deformed into a
system in the same isomorphism class as the JaynesCummings model from optics.

A natural number is a binary $k$'th power if its binary representation
consists of $k$ consecutive identical blocks. We prove an analogue of Waring's
theorem for sums of binary $k$'th powers. More precisely, we show that for each
integer $k \geq 2$, there exists a positive integer $W(k)$ such that every
sufficiently large multiple of $E_k := \gcd(2^k  1, k)$ is the sum of at most
$W(k)$ binary $k$'th powers. (The hypothesis of being a multiple of $E_k$
cannot be omitted, since we show that the $\gcd$ of the binary $k$'th powers is
$E_k$.) Also, we explain how our results can be extended to arbitrary integer
bases $b > 2$.

We study the problem of generalized uniformity testing \cite{BC17} of a
discrete probability distribution: Given samples from a probability
distribution $p$ over an {\em unknown} discrete domain $\mathbf{\Omega}$, we
want to distinguish, with probability at least $2/3$, between the case that $p$
is uniform on some {\em subset} of $\mathbf{\Omega}$ versus $\epsilon$far, in
total variation distance, from any such uniform distribution.
We establish tight bounds on the sample complexity of generalized uniformity
testing. In more detail, we present a computationally efficient tester whose
sample complexity is optimal, up to constant factors, and a matching
informationtheoretic lower bound. Specifically, we show that the sample
complexity of generalized uniformity testing is
$\Theta\left(1/(\epsilon^{4/3}\p\_3) + 1/(\epsilon^{2} \p\_2) \right)$.

We present a simple Logspace algorithm that solves the Unary SubsetSum
problem.

We study the efficient learnability of geometric concept classes 
specifically, lowdegree polynomial threshold functions (PTFs) and
intersections of halfspaces  when a fraction of the data is adversarially
corrupted. We give the first polynomialtime PAC learning algorithms for these
concept classes with dimensionindependent error guarantees in the presence of
nasty noise under the Gaussian distribution. In the nasty noise model, an
omniscient adversary can arbitrarily corrupt a small fraction of both the
unlabeled data points and their labels. This model generalizes wellstudied
noise models, including the malicious noise model and the agnostic (adversarial
label noise) model. Prior to our work, the only concept class for which
efficient malicious learning algorithms were known was the class of
origincentered halfspaces.
Specifically, our robust learning algorithm for lowdegree PTFs succeeds
under a number of tame distributions  including the Gaussian distribution
and, more generally, any logconcave distribution with (approximately) known
lowdegree moments. For LTFs under the Gaussian distribution, we give a
polynomialtime algorithm that achieves error $O(\epsilon)$, where $\epsilon$
is the noise rate. At the core of our PAC learning results is an efficient
algorithm to approximate the lowdegree Chowparameters of any bounded function
in the presence of nasty noise. To achieve this, we employ an iterative
spectral method for outlier detection and removal, inspired by recent work in
robust unsupervised learning. Our aforementioned algorithm succeeds for a range
of distributions satisfying mild concentration bounds and moment assumptions.
The correctness of our robust learning algorithm for intersections of
halfspaces makes essential use of a novel robust inverse independence lemma
that may be of broader interest.

We study the problem of estimating multivariate logconcave probability
density functions. We prove the first sample complexity upper bound for
learning logconcave densities on $\mathbb{R}^d$, for all $d \geq 1$. Prior to
our work, no upper bound on the sample complexity of this learning problem was
known for the case of $d>3$. In more detail, we give an estimator that, for any
$d \ge 1$ and $\epsilon>0$, draws $\tilde{O}_d \left( (1/\epsilon)^{(d+5)/2}
\right)$ samples from an unknown target logconcave density on $\mathbb{R}^d$,
and outputs a hypothesis that (with high probability) is $\epsilon$close to
the target, in total variation distance. Our upper bound on the sample
complexity comes close to the known lower bound of $\Omega_d \left(
(1/\epsilon)^{(d+1)/2} \right)$ for this problem.

We study an extension of active learning in which the learning algorithm may
ask the annotator to compare the distances of two examples from the boundary of
their labelclass. For example, in a recommendation system application (say for
restaurants), the annotator may be asked whether she liked or disliked a
specific restaurant (a label query); or which one of two restaurants did she
like more (a comparison query).
We focus on the class of half spaces, and show that under natural
assumptions, such as large margin or bounded bitdescription of the input
examples, it is possible to reveal all the labels of a sample of size $n$ using
approximately $O(\log n)$ queries. This implies an exponential improvement over
classical active learning, where only label queries are allowed. We complement
these results by showing that if any of these assumptions is removed then, in
the worst case, $\Omega(n)$ queries are required.
Our results follow from a new general framework of active learning with
additional queries. We identify a combinatorial dimension, called the
\emph{inference dimension}, that captures the query complexity when each
additional query is determined by $O(1)$ examples (such as comparison queries,
each of which is determined by the two compared examples). Our results for half
spaces follow by bounding the inference dimension in the cases discussed above.

We describe a general technique that yields the first {\em Statistical Query
lower bounds} for a range of fundamental highdimensional learning problems
involving Gaussian distributions. Our main results are for the problems of (1)
learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of
a single unknown Gaussian distribution. For each of these problems, we show a
{\em superpolynomial gap} between the (informationtheoretic) sample
complexity and the computational complexity of {\em any} Statistical Query
algorithm for the problem. Our SQ lower bound for Problem (1) is qualitatively
matched by known learning algorithms for GMMs. Our lower bound for Problem (2)
implies that the accuracy of the robust learning algorithm
in~\cite{DiakonikolasKKLMS16} is essentially best possible among all
polynomialtime SQ algorithms.
Our SQ lower bounds are attained via a unified momentmatching technique that
is useful in other contexts and may be of broader interest. Our technique
yields nearlytight lower bounds for a number of related unsupervised
estimation problems. Specifically, for the problems of (3) robust covariance
estimation in spectral norm, and (4) robust sparse mean estimation, we
establish a quadratic {\em statisticalcomputational tradeoff} for SQ
algorithms, matching known upper bounds. Finally, our technique can be used to
obtain tight sample complexity lower bounds for highdimensional {\em testing}
problems. Specifically, for the classical problem of robustly {\em testing} an
unknown mean (known covariance) Gaussian, our technique implies an
informationtheoretic sample lower bound that scales {\em linearly} in the
dimension. Our sample lower bound matches the sample complexity of the
corresponding robust {\em learning} problem and separates the sample complexity
of robust testing from standard (nonrobust) testing.

We construct near optimal linear decision trees for a variety of decision
problems in combinatorics and discrete geometry. For example, for any constant
$k$, we construct linear decision trees that solve the $k$SUM problem on $n$
elements using $O(n \log^2 n)$ linear queries. Moreover, the queries we use are
comparison queries, which compare the sums of two $k$subsets; when viewed as
linear queries, comparison queries are $2k$sparse and have only $\{1,0,1\}$
coefficients. We give similar constructions for sorting sumsets $A+B$ and for
solving the SUBSETSUM problem, both with optimal number of queries, up to
polylogarithmic terms.
Our constructions are based on the notion of "inference dimension", recently
introduced by the authors in the context of active classification with
comparison queries. This can be viewed as another contribution to the fruitful
link between machine learning and discrete geometry, which goes back to the
discovery of the VC dimension.

We study the fundamental problem of learning the parameters of a
highdimensional Gaussian in the presence of noise  where an
$\varepsilon$fraction of our samples were chosen by an adversary. We give
robust estimators that achieve estimation error $O(\varepsilon)$ in the total
variation distance, which is optimal up to a universal constant that is
independent of the dimension.
In the case where just the mean is unknown, our robustness guarantee is
optimal up to a factor of $\sqrt{2}$ and the running time is polynomial in $d$
and $1/\epsilon$. When both the mean and covariance are unknown, the running
time is polynomial in $d$ and quasipolynomial in $1/\varepsilon$. Moreover all
of our algorithms require only a polynomial number of samples. Our work shows
that the same sorts of error guarantees that were established over fifty years
ago in the onedimensional setting can also be achieved by efficient algorithms
in highdimensional settings.

We investigate the problem of testing the equivalence between two discrete
histograms. A {\em $k$histogram} over $[n]$ is a probability distribution that
is piecewise constant over some set of $k$ intervals over $[n]$. Histograms
have been extensively studied in computer science and statistics. Given a set
of samples from two $k$histogram distributions $p, q$ over $[n]$, we want to
distinguish (with high probability) between the cases that $p = q$ and
$\pq\_1 \geq \epsilon$. The main contribution of this paper is a new
algorithm for this testing problem and a nearly matching informationtheoretic
lower bound. Specifically, the sample complexity of our algorithm matches our
lower bound up to a logarithmic factor, improving on previous work by
polynomial factors in the relevant parameters. Our algorithmic approach applies
in a more general setting and yields improved sample upper bounds for testing
closeness of other structured distributions as well.

A symplectic semitoric manifold is a symplectic $4$manifold endowed with a
Hamiltonian $(S^1 \times \mathbb{R})$action satisfying certain conditions. The
goal of this paper is to construct a new symplectic invariant of symplectic
semitoric manifolds, the helix, and give applications. The helix is a
symplectic analogue of the fan of a nonsingular complete toric variety in
algebraic geometry, that takes into account the effects of the monodromy near
focusfocus singularities. We give two applications of the helix: first, we use
it to give a classification of the minimal models of symplectic semitoric
manifolds, where "minimal" is in the sense of not admitting any blowdowns. The
second application is an extension to the compact case of a well known result
of V\~{u} Ngoc about the constraints posed on a symplectic semitoric manifold
by the existence of focusfocus singularities. The helix permits to translate a
symplectic geometric problem into an algebraic problem, and the paper describes
a method to solve this type of algebraic problem.

We consider the problem of estimating a Fouriersparse signal from noisy
samples, where the sampling is done over some interval $[0, T]$ and the
frequencies can be "offgrid". Previous methods for this problem required the
gap between frequencies to be above 1/T, the threshold required to robustly
identify individual frequencies. We show the frequency gap is not necessary to
estimate the signal as a whole: for arbitrary $k$Fouriersparse signals under
$\ell_2$ bounded noise, we show how to estimate the signal with a constant
factor growth of the noise and sample complexity polynomial in $k$ and
logarithmic in the bandwidth and signaltonoise ratio.
As a special case, we get an algorithm to interpolate degree $d$ polynomials
from noisy measurements, using $O(d)$ samples and increasing the noise by a
constant factor in $\ell_2$.

An $(n, k)$Poisson Multinomial Distribution (PMD) is a random variable of
the form $X = \sum_{i=1}^n X_i$, where the $X_i$'s are independent random
vectors supported on the set of standard basis vectors in $\mathbb{R}^k.$ In
this paper, we obtain a refined structural understanding of PMDs by analyzing
their Fourier transform. As our core structural result, we prove that the
Fourier transform of PMDs is {\em approximately sparse}, i.e., roughly
speaking, its $L_1$norm is small outside a small set. By building on this
result, we obtain the following applications:
{\bf Learning Theory.} We design the first computationally efficient learning
algorithm for PMDs with respect to the total variation distance. Our algorithm
learns an arbitrary $(n, k)$PMD within variation distance $\epsilon$ using a
nearoptimal sample size of $\widetilde{O}_k(1/\epsilon^2),$ and runs in time
$\widetilde{O}_k(1/\epsilon^2) \cdot \log n.$ Previously, no algorithm with a
$\mathrm{poly}(1/\epsilon)$ runtime was known, even for $k=3.$
{\bf Game Theory.} We give the first efficient polynomialtime approximation
scheme (EPTAS) for computing Nash equilibria in anonymous games. For normalized
anonymous games with $n$ players and $k$ strategies, our algorithm computes a
wellsupported $\epsilon$Nash equilibrium in time $n^{O(k^3)} \cdot
(k/\epsilon)^{O(k^3\log(k/\epsilon)/\log\log(k/\epsilon))^{k1}}.$ The best
previous algorithm for this problem had running time $n^{(f(k)/\epsilon)^k},$
where $f(k) = \Omega(k^{k^2})$, for any $k>2.$
{\bf Statistics.} We prove a multivariate central limit theorem (CLT) that
relates an arbitrary PMD to a discretized multivariate Gaussian with the same
mean and covariance, in total variation distance. Our new CLT strengthens the
CLT of Valiant and Valiant by completely removing the dependence on $n$ in the
error bound.

We study the {\em robust proper learning} of univariate logconcave
distributions (over continuous and discrete domains). Given a set of samples
drawn from an unknown target distribution, we want to compute a logconcave
hypothesis distribution that is as close as possible to the target, in total
variation distance. In this work, we give the first computationally efficient
algorithm for this learning problem. Our algorithm achieves the
informationtheoretically optimal sample size (up to a constant factor), runs
in polynomial time, and is robust to model misspecification with nearlyoptimal
error guarantees.
Specifically, we give an algorithm that, on input $n=O(1/\eps^{5/2})$ samples
from an unknown distribution $f$, runs in time $\widetilde{O}(n^{8/5})$, and
outputs a logconcave hypothesis $h$ that (with high probability) satisfies
$\dtv(h, f) = O(\opt)+\eps$, where $\opt$ is the minimum total variation
distance between $f$ and the class of logconcave distributions. Our approach
to the robust proper learning problem is quite flexible and may be applicable
to many other univariate distribution families.

In this work, we give a novel general approach for distribution testing. We
describe two techniques: our first technique gives sampleoptimal testers,
while our second technique gives matching sample lower bounds. As a
consequence, we resolve the sample complexity of a wide variety of testing
problems.
Our upper bounds are obtained via a modular reductionbased approach. Our
approach yields optimal testers for numerous problems by using a standard
$\ell_2$identity tester as a blackbox. Using this recipe, we obtain simple
estimators for a wide range of problems, encompassing most problems previously
studied in the TCS literature, namely: (1) identity testing to a fixed
distribution, (2) closeness testing between two unknown distributions (with
equal/unequal sample sizes), (3) independence testing (in any number of
dimensions), (4) closeness testing for collections of distributions, and (5)
testing histograms. For all of these problems, our testers are sampleoptimal,
up to constant factors. With the exception of (1), ours are the {\em first
sampleoptimal testers for the corresponding problems.} Moreover, our
estimators are significantly simpler to state and analyze compared to previous
results.
As an application of our reductionbased technique, we obtain the first {\em
nearly instanceoptimal} algorithm for testing equivalence between two {\em
unknown} distributions. Moreover, our technique naturally generalizes to other
metrics beyond the $\ell_1$distance.
Our lower bounds are obtained via a direct informationtheoretic approach:
Given a candidate hard instance, our proof proceeds by bounding the mutual
information between appropriate random variables. While this is a classical
method in information theory, prior to our work, it had not been used in
distribution property testing.

In order to formally understand the power of neural computing, we first need
to crack the frontier of threshold circuits with two and three layers, a regime
that has been surprisingly intractable to analyze. We prove the first
superlinear gate lower bounds and the first superquadratic wire lower bounds
for depthtwo linear threshold circuits with arbitrary weights, and depththree
majority circuits computing an explicit function.
$\bullet$ We prove that for all $\epsilon\gg \sqrt{\log(n)/n}$, the
lineartime computable Andreev's function cannot be computed on a
$(1/2+\epsilon)$fraction of $n$bit inputs by depthtwo linear threshold
circuits of $o(\epsilon^3 n^{3/2}/\log^3 n)$ gates, nor can it be computed with
$o(\epsilon^{3} n^{5/2}/\log^{7/2} n)$ wires. This establishes an averagecase
``size hierarchy'' for threshold circuits, as Andreev's function is computable
by uniform depthtwo circuits of $o(n^3)$ linear threshold gates, and by
uniform depththree circuits of $O(n)$ majority gates.
$\bullet$ We present a new function in $P$ based on smallbiased sets, which
we prove cannot be computed by a majority vote of depthtwo linear threshold
circuits with $o(n^{3/2}/\log^3 n)$ gates, nor with $o(n^{5/2}/\log^{7/2}n)$
wires.
$\bullet$ We give tight averagecase (gate and wire) complexity results for
computing PARITY with depthtwo threshold circuits; the answer turns out to be
the same as for depthtwo majority circuits.
The key is a new random restriction lemma for linear threshold functions. Our
main analytical tool is the LittlewoodOfford Lemma from additive
combinatorics.

We study the structure and learnability of sums of independent integer random
variables (SIIRVs). For $k \in \mathbb{Z}_{+}$, a $k$SIIRV of order $n \in
\mathbb{Z}_{+}$ is the probability distribution of the sum of $n$ independent
random variables each supported on $\{0, 1, \dots, k1\}$. We denote by ${\cal
S}_{n,k}$ the set of all $k$SIIRVs of order $n$.
In this paper, we tightly characterize the sample and computational
complexity of learning $k$SIIRVs. More precisely, we design a computationally
efficient algorithm that uses $\widetilde{O}(k/\epsilon^2)$ samples, and learns
an arbitrary $k$SIIRV within error $\epsilon,$ in total variation distance.
Moreover, we show that the {\em optimal} sample complexity of this learning
problem is $\Theta((k/\epsilon^2)\sqrt{\log(1/\epsilon)}).$ Our algorithm
proceeds by learning the Fourier transform of the target $k$SIIRV in its
effective support. Its correctness relies on the {\em approximate sparsity} of
the Fourier transform of $k$SIIRVs  a structural property that we establish,
roughly stating that the Fourier transform of $k$SIIRVs has small magnitude
outside a small set.
Along the way we prove several new structural results about $k$SIIRVs. As
one of our main structural contributions, we give an efficient algorithm to
construct a sparse {\em proper} $\epsilon$cover for ${\cal S}_{n,k},$ in total
variation distance. We also obtain a novel geometric characterization of the
space of $k$SIIRVs. Our characterization allows us to prove a tight lower
bound on the size of $\epsilon$covers for ${\cal S}_{n,k}$, and is the key
ingredient in our tight sample complexity lower bound.
Our approach of exploiting the sparsity of the Fourier transform in
distribution learning is general, and has recently found additional
applications.

We give an algorithm for properly learning Poisson binomial distributions. A
Poisson binomial distribution (PBD) of order $n$ is the discrete probability
distribution of the sum of $n$ mutually independent Bernoulli random variables.
Given $\widetilde{O}(1/\epsilon^2)$ samples from an unknown PBD $\mathbf{p}$,
our algorithm runs in time $(1/\epsilon)^{O(\log \log (1/\epsilon))}$, and
outputs a hypothesis PBD that is $\epsilon$close to $\mathbf{p}$ in total
variation distance. The previously best known running time for properly
learning PBDs was $(1/\epsilon)^{O(\log(1/\epsilon))}$.
As one of our main contributions, we provide a novel structural
characterization of PBDs. We prove that, for all $\epsilon >0,$ there exists an
explicit collection $\cal{M}$ of $(1/\epsilon)^{O(\log \log (1/\epsilon))}$
vectors of multiplicities, such that for any PBD $\mathbf{p}$ there exists a
PBD $\mathbf{q}$ with $O(\log(1/\epsilon))$ distinct parameters whose
multiplicities are given by some element of ${\cal M}$, such that $\mathbf{q}$
is $\epsilon$close to $\mathbf{p}$. Our proof combines tools from Fourier
analysis and algebraic geometry.
Our approach to the proper learning problem is as follows: Starting with an
accurate nonproper hypothesis, we fit a PBD to this hypothesis. More
specifically, we essentially start with the hypothesis computed by the
computationally efficient nonproper learning algorithm in our recent
work~\cite{DKS15}. Our aforementioned structural characterization allows us to
reduce the corresponding fitting problem to a collection of
$(1/\epsilon)^{O(\log \log(1/\epsilon))}$ systems of lowdegree polynomial
inequalities. We show that each such system can be solved in time
$(1/\epsilon)^{O(\log \log(1/\epsilon))}$, which yields the overall running
time of our algorithm.

We determine for what proportion of integers $h$ one now knows that there are
infinitely many prime pairs $p,\ p+h$ as a consequence of the ZhangMaynardTao
theorem. We consider the natural generalization of this to $k$tuples of
integers, and we determine the limit of what can be deduced assuming only the
ZhangMaynardTao theorem.

We give a general unified method that can be used for $L_1$ {\em closeness
testing} of a wide range of univariate structured distribution families. More
specifically, we design a sample optimal and computationally efficient
algorithm for testing the equivalence of two unknown (potentially arbitrary)
univariate distributions under the $\mathcal{A}_k$distance metric: Given
sample access to distributions with density functions $p, q: I \to \mathbb{R}$,
we want to distinguish between the cases that $p=q$ and
$\pq\_{\mathcal{A}_k} \ge \epsilon$ with probability at least $2/3$. We show
that for any $k \ge 2, \epsilon>0$, the {\em optimal} sample complexity of the
$\mathcal{A}_k$closeness testing problem is $\Theta(\max\{
k^{4/5}/\epsilon^{6/5}, k^{1/2}/\epsilon^2 \})$. This is the first $o(k)$
sample algorithm for this problem, and yields new, simple $L_1$ closeness
testers, in most cases with optimal sample complexity, for broad classes of
structured distributions.