
We introduce a new technique for reducing the dimension of the ambient space
of lowdegree polynomials in the Gaussian space while preserving their relative
correlation structure, analogous to the JohnsonLindenstrauss lemma. As
applications, we address the following problems:
1. Computability of Approximately Optimal Noise Stable function over Gaussian
space: The goal is to find a partition of $\mathbb{R}^n$ into $k$ parts, that
maximizes the noise stability. An $\delta$optimal partition is one which is
within additive $\delta$ of the optimal noise stability.
De, Mossel & Neeman (CCC 2017) raised the question of proving a computable
bound on the dimension $n_0(\delta)$ in which we can find an $\delta$optimal
partition. While De et al. provide such a bound, using our new technique, we
obtain improved explicit bounds on the dimension $n_0(\delta)$.
2. Decidability of NonInteractive Simulation of Joint Distributions: A
"noninteractive simulation" problem is specified by two distributions $P(x,y)$
and $Q(u,v)$: The goal is to determine if two players that observe sequences
$X^n$ and $Y^n$ respectively where $\{(X_i, Y_i)\}_{i=1}^n$ are drawn i.i.d.
from $P(x,y)$ can generate pairs $U$ and $V$ respectively (without
communicating with each other) with a joint distribution that is arbitrarily
close in total variation to $Q(u,v)$. Even when $P$ and $Q$ are extremely
simple, it is open in several cases if $P$ can simulate $Q$.
In the special where $Q$ is a joint distribution over $\{0,1\} \times
\{0,1\}$, Ghazi, Kamath and Sudan (FOCS 2016) proved a computable bound on the
number of samples $n_0(\delta)$ that can be drawn from $P(x,y)$ to get
$\delta$close to $Q$ (if it is possible at all). Recently De, Mossel & Neeman
obtained such bounds when $Q$ is a distribution over $[k] \times [k]$ for any
$k \ge 2$. We recover this result with improved explicit bounds on
$n_0(\delta)$.

We study common randomness where two parties have access to i.i.d. samples
from a known random source, and wish to generate a shared random key using
limited (or no) communication with the largest possible probability of
agreement. This problem is at the core of secret key generation in
cryptography, with connections to communication under uncertainty and locality
sensitive hashing. We take the approach of treating correlated sources as a
critical resource, and ask whether common randomness can be generated
resourceefficiently.
We consider two notable sources in this setup arising from correlated bits
and correlated Gaussians. We design the first explicit schemes that use only a
polynomial number of samples (in the key length) so that the players can
generate shared keys that agree with constant probability using optimal
communication. The best previously known schemes were both nonconstructive and
used an exponential number of samples. In the amortized setting, we
characterize the largest achievable ratio of key length to communication in
terms of the external and internal information costs, two wellstudied
quantities in theoretical computer science. In the relaxed setting where the
two parties merely wish to improve the correlation between the generated keys
of length $k$, we show that there are no interactive protocols using $o(k)$
bits of communication having agreement probability even as small as
$2^{o(k)}$. For the related communication problem where the players wish to
compute a joint function $f$ of their inputs using i.i.d. samples from a known
source, we give a zerocommunication protocol using $2^{O(c)}$ bits where $c$
is the interactive randomized publiccoin communication complexity of $f$. This
matches the lower bound shown previously while the best previously known upper
bound was doubly exponential in $c$.

In a recent work (Ghazi et al., SODA 2016), the authors with Komargodski and
Kothari initiated the study of communication with contextual uncertainty, a
setup aiming to understand how efficient communication is possible when the
communicating parties imperfectly share a huge context. In this setting, Alice
is given a function $f$ and an input string $x$, and Bob is given a function
$g$ and an input string $y$. The pair $(x,y)$ comes from a known distribution
$\mu$ and $f$ and $g$ are guaranteed to be close under this distribution. Alice
and Bob wish to compute $g(x,y)$ with high probability. The previous work
showed that any problem with oneway communication complexity $k$ in the
standard model has publiccoin communication $O(k(1+I))$ bits in the uncertain
case, where $I$ is the mutual information between $x$ and $y$. A lower bound of
$\Omega(\sqrt{I})$ bits on the publiccoin uncertain communication was also
shown. An important question that was left open is the power that public
randomness brings to uncertain communication. Can Alice and Bob achieve
efficient communication amid uncertainty without using public randomness? How
powerful are publiccoin protocols in overcoming uncertainty? Motivated by
these two questions:
 We prove the first separation between privatecoin uncertain communication
and publiccoin uncertain communication. We exhibit a function class for which
the communication in the standard model and the publiccoin uncertain
communication are $O(1)$ while the privatecoin uncertain communication is a
growing function of the length $n$ of the inputs.
 We improve the lowerbound of the previous work on publiccoin uncertain
communication. We exhibit a function class and a distribution (with $I \approx
n$) for which the oneway certain communication is $k$ bits but the oneway
publiccoin uncertain communication is at least $\Omega(\sqrt{k} \cdot
\sqrt{I})$ bits.

Several wellstudied models of access to data samples, including statistical
queries, local differential privacy and lowcommunication algorithms rely on
queries that provide information about a function of a single sample. (For
example, a statistical query (SQ) gives an estimate of $Ex_{x \sim D}[q(x)]$
for any choice of the query function $q$ mapping $X$ to the reals, where $D$ is
an unknown data distribution over $X$.) Yet some data analysis algorithms rely
on properties of functions that depend on multiple samples. Such algorithms
would be naturally implemented using $k$wise queries each of which is
specified by a function $q$ mapping $X^k$ to the reals. Hence it is natural to
ask whether algorithms using $k$wise queries can solve learning problems more
efficiently and by how much.
Blum, Kalai and Wasserman (2003) showed that for any weak PAC learning
problem over a fixed distribution, the complexity of learning with $k$wise SQs
is smaller than the (unary) SQ complexity by a factor of at most $2^k$. We show
that for more general problems over distributions the picture is substantially
richer. For every $k$, the complexity of distributionindependent PAC learning
with $k$wise queries can be exponentially larger than learning with
$(k+1)$wise queries. We then give two approaches for simulating a $k$wise
query using unary queries. The first approach exploits the structure of the
problem that needs to be solved. It generalizes and strengthens (exponentially)
the results of Blum et al.. It allows us to derive strong lower bounds for
learning DNF formulas and stochastic constraint satisfaction problems that hold
against algorithms using $k$wise queries. The second approach exploits the
$k$party communication complexity of the $k$wise query function.

In the "correlated sampling" problem, two players, say Alice and Bob, are
given two distributions, say $P$ and $Q$ respectively, over the same universe
and access to shared randomness. The two players are required to output two
elements, without any interaction, sampled according to their respective
distributions, while trying to minimize the probability that their outputs
disagree. A wellknown protocol due to Holenstein, with close variants (for
similar problems) due to Broder, and to Kleinberg and Tardos, solves this task
with disagreement probability at most $2 \delta/(1+\delta)$, where $\delta$ is
the total variation distance between $P$ and $Q$. This protocol has been used
in several different contexts including sketching algorithms, approximation
algorithms based on rounding linear programming relaxations, the study of
parallel repetition and cryptography.
In this note, we give a surprisingly simple proof that this protocol is in
fact tight. Specifically, for every $\delta \in (0,1)$, we show that any
correlated sampling scheme should have disagreement probability at least
$2\delta/(1+\delta)$. This partially answers a recent question of Rivest.
Our proof is based on studying a new problem we call "constrained agreement".
Here, Alice is given a subset $A \subseteq [n]$ and is required to output an
element $i \in A$, Bob is given a subset $B \subseteq [n]$ and is required to
output an element $j \in B$, and the goal is to minimize the probability that
$i \neq j$. We prove tight bounds on this question, which turn out to imply
tight bounds for correlated sampling. Though we settle basic questions about
the two problems, our formulation also leads to several questions that remain
open.

Establishing the complexity of {\em Bounded Distance Decoding} for
ReedSolomon codes is a fundamental open problem in coding theory, explicitly
asked by Guruswami and Vardy (IEEE Trans. Inf. Theory, 2005). The problem is
motivated by the large current gap between the regime when it is NPhard, and
the regime when it is efficiently solvable (i.e., the Johnson radius).
We show the first NPhardness results for asymptotically smaller decoding
radii than the maximum likelihood decoding radius of Guruswami and Vardy.
Specifically, for ReedSolomon codes of length $N$ and dimension $K=O(N)$, we
show that it is NPhard to decode more than $ NK c\frac{\log N}{\log\log N}$
errors (with $c>0$ an absolute constant). Moreover, we show that the problem is
NPhard under quasipolynomialtime reductions for an error amount $> NK
c\log{N}$ (with $c>0$ an absolute constant).
These results follow from the NPhardness of a generalization of the
classical Subset Sum problem to higher moments, called {\em Moments Subset
Sum}, which has been a known open problem, and which may be of independent
interest.
We further reveal a strong connection with the wellstudied
ProuhetTarryEscott problem in Number Theory, which turns out to capture a
main barrier in extending our techniques. We believe the ProuhetTarryEscott
problem deserves further study in the theoretical computer science community.

We present decidability results for a subclass of "noninteractive"
simulation problems, a wellstudied class of problems in information theory. A
noninteractive simulation problem is specified by two distributions $P(x,y)$
and $Q(u,v)$: The goal is to determine if two players, Alice and Bob, that
observe sequences $X^n$ and $Y^n$ respectively where $\{(X_i, Y_i)\}_{i=1}^n$
are drawn i.i.d. from $P(x,y)$ can generate pairs $U$ and $V$ respectively
(without communicating with each other) with a joint distribution that is
arbitrarily close in total variation to $Q(u,v)$. Even when $P$ and $Q$ are
extremely simple: e.g., $P$ is uniform on the triples $\{(0,0), (0,1), (1,0)\}$
and $Q$ is a "doubly symmetric binary source", i.e., $U$ and $V$ are uniform
$\pm 1$ variables with correlation say $0.49$, it is open if $P$ can simulate
$Q$.
In this work, we show that whenever $P$ is a distribution on a finite domain
and $Q$ is a $2 \times 2$ distribution, then the noninteractive simulation
problem is decidable: specifically, given $\delta > 0$ the algorithm runs in
time bounded by some function of $P$ and $\delta$ and either gives a
noninteractive simulation protocol that is $\delta$close to $Q$ or asserts
that no protocol gets $O(\delta)$close to $Q$. The main challenge to such a
result is determining explicit (computable) convergence bounds on the number
$n$ of samples that need to be drawn from $P(x,y)$ to get $\delta$close to
$Q$. We invoke contemporary results from the analysis of Boolean functions such
as the invariance principle and a regularity lemma to obtain such explicit
bounds.

We introduce a simple model illustrating the role of context in communication
and the challenge posed by uncertainty of knowledge of context. We consider a
variant of distributional communication complexity where Alice gets some
information $x$ and Bob gets $y$, where $(x,y)$ is drawn from a known
distribution, and Bob wishes to compute some function $g(x,y)$ (with high
probability over $(x,y)$). In our variant, Alice does not know $g$, but only
knows some function $f$ which is an approximation of $g$. Thus, the function
being computed forms the context for the communication, and knowing it
imperfectly models (mild) uncertainty in this context.
A naive solution would be for Alice and Bob to first agree on some common
function $h$ that is close to both $f$ and $g$ and then use a protocol for $h$
to compute $h(x,y)$. We show that any such agreement leads to a large overhead
in communication ruling out such a universal solution.
In contrast, we show that if $g$ has a oneway communication protocol with
complexity $k$ in the standard setting, then it has a communication protocol
with complexity $O(k \cdot (1+I))$ in the uncertain setting, where $I$ denotes
the mutual information between $x$ and $y$. In the particular case where the
input distribution is a product distribution, the protocol in the uncertain
setting only incurs a constant factor blowup in communication and error.
Furthermore, we show that the dependence on the mutual information $I$ is
required. Namely, we construct a class of functions along with a nonproduct
distribution over $(x,y)$ for which the communication complexity is a single
bit in the standard setting but at least $\Omega(\sqrt{n})$ bits in the
uncertain setting.

Motivated by the quest for a broader understanding of communication
complexity of simple functions, we introduce the class of
"permutationinvariant" functions. A partial function $f:\{0,1\}^n \times
\{0,1\}^n\to \{0,1,?\}$ is permutationinvariant if for every bijection
$\pi:\{1,\ldots,n\} \to \{1,\ldots,n\}$ and every $\mathbf{x}, \mathbf{y} \in
\{0,1\}^n$, it is the case that $f(\mathbf{x}, \mathbf{y}) =
f(\mathbf{x}^{\pi}, \mathbf{y}^{\pi})$. Most of the commonly studied functions
in communication complexity are permutationinvariant. For such functions, we
present a simple complexity measure (computable in time polynomial in $n$ given
an implicit description of $f$) that describes their communication complexity
up to polynomial factors and up to an additive error that is logarithmic in the
input size. This gives a coarse taxonomy of the communication complexity of
simple functions. Our work highlights the role of the wellknown lower bounds
of functions such as 'SetDisjointness' and 'Indexing', while complementing
them with the relatively lesserknown upper bounds for 'GapInnerProduct'
(from the sketching literature) and 'SparseGapInnerProduct' (from the recent
work of Canonne et al. [ITCS 2015]). We also present consequences to the study
of communication complexity with imperfectly shared randomness where we show
that for total permutationinvariant functions, imperfectly shared randomness
results in only a polynomial blowup in communication complexity after an
additive $O(\log \log n)$ overhead.

Random (dv,dc)regular LDPC codes are wellknown to achieve the Shannon
capacity of the binary symmetric channel (for sufficiently large dv and dc)
under exponential time decoding. However, polynomial time algorithms are only
known to correct a much smaller fraction of errors. One of the most powerful
polynomialtime algorithms with a formal analysis is the LP decoding algorithm
of Feldman et al. which is known to correct an Omega(1/dc) fraction of errors.
In this work, we show that fairly powerful extensions of LP decoding, based on
the SheraliAdams and Lasserre hierarchies, fail to correct much more errors
than the basic LPdecoder. In particular, we show that:
1) For any values of dv and dc, a linear number of rounds of the
SheraliAdams LP hierarchy cannot correct more than an O(1/dc) fraction of
errors on a random (dv,dc)regular LDPC code.
2) For any value of dv and infinitely many values of dc, a linear number of
rounds of the Lasserre SDP hierarchy cannot correct more than an O(1/dc)
fraction of errors on a random (dv,dc)regular LDPC code.
Our proofs use a new stretching and collapsing technique that allows us to
leverage recent progress in the study of the limitations of LP/SDP hierarchies
for Maximum Constraint Satisfaction Problems (MaxCSPs). The problem then
reduces to the construction of special balanced pairwise independent
distributions for SheraliAdams and special cosets of balanced pairwise
independent subgroups for Lasserre.
Some of our techniques are more generally applicable to a large class of
Boolean CSPs called MinOnes. In particular, for kHypergraph Vertex Cover, we
obtain an improved integrality gap of $k1\epsilon$ that holds after a
\emph{linear} number of rounds of the Lasserre hierarchy, for any k = q+1 with
q an arbitrary prime power. The best previous gap for a linear number of rounds
was equal to $2\epsilon$ and due to Schoenebeck.

We present the first sampleoptimal sublinear time algorithms for the sparse
Discrete Fourier Transform over a twodimensional sqrt{n} x sqrt{n} grid. Our
algorithms are analyzed for /average case/ signals. For signals whose spectrum
is exactly sparse, our algorithms use O(k) samples and run in O(k log k) time,
where k is the expected sparsity of the signal. For signals whose spectrum is
approximately sparse, our algorithm uses O(k log n) samples and runs in O(k
log^2 n) time; the latter algorithm works for k=Theta(sqrt{n}). The number of
samples used by our algorithms matches the known lower bounds for the
respective signal models.
By a known reduction, our algorithms give similar results for the
onedimensional sparse Discrete Fourier Transform when n is a power of a small
composite number (e.g., n = 6^t).

For a given family of spatially coupled codes, we prove that the LP threshold
on the BSC of the graph cover ensemble is the same as the LP threshold on the
BSC of the derived spatially coupled ensemble. This result is in contrast with
the fact that the BP threshold of the derived spatially coupled ensemble is
believed to be larger than the BP threshold of the graph cover ensemble as
noted by the work of Kudekar et al. (2011, 2012). To prove this, we establish
some properties related to the dual witness for LP decoding which was
introduced by Feldman et al. (2007) and simplified by Daskalakis et al. (2008).
More precisely, we prove that the existence of a dual witness which was
previously known to be sufficient for LP decoding success is also necessary and
is equivalent to the existence of certain acyclic hyperflows. We also derive a
sublinear (in the block length) upper bound on the weight of any edge in such
hyperflows, both for regular LPDC codes and for spatially coupled codes and we
prove that the bound is asymptotically tight for regular LDPC codes. Moreover,
we show how to trade crossover probability for "LP excess" on all the variable
nodes, for any binary linear code.