
In this paper we present a deterministic polynomial time algorithm for
testing if a symbolic matrix in noncommuting variables over $\mathbb{Q}$ is
invertible or not. The analogous question for commuting variables is the
celebrated polynomial identity testing (PIT) for symbolic determinants. In
contrast to the commutative case, which has an efficient probabilistic
algorithm, the best previous algorithm for the noncommutative setting required
exponential time (whether or not randomization is allowed). The algorithm
efficiently solves the "word problem" for the free skew field, and the identity
testing problem for arithmetic formulae with division over noncommuting
variables, two problems which had only exponentialtime algorithms prior to
this work.
The main contribution of this paper is a complexity analysis of an existing
algorithm due to Gurvits, who proved it was polynomial time for certain classes
of inputs. We prove it always runs in polynomial time. The main component of
our analysis is a simple (given the necessary known tools) lower bound on
central notion of capacity of operators (introduced by Gurvits). We extend the
algorithm to actually approximate capacity to any accuracy in polynomial time,
and use this analysis to give quantitative bounds on the continuity of capacity
(the latter is used in a subsequent paper on BrascampLieb inequalities).
Symbolic matrices in noncommuting variables, and the related structural and
algorithmic questions, have a remarkable number of diverse origins and
motivations. They arise independently in (commutative) invariant theory and
representation theory, linear algebra, optimization, linear system theory,
quantum information theory, approximation of the permanent and naturally in
noncommutative algebra. We provide a detailed account of some of these sources
and their interconnections.

We present a polynomial time algorithm to approximately scale tensors of any
format to arbitrary prescribed marginals (whenever possible). This unifies and
generalizes a sequence of past works on matrix, operator and tensor scaling.
Our algorithm provides an efficient weak membership oracle for the associated
moment polytopes, an important family of implicitlydefined convex polytopes
with exponentially many facets and a wide range of applications. These include
the entanglement polytopes from quantum information theory (in particular, we
obtain an efficient solution to the notorious onebody quantum marginal
problem) and the Kronecker polytopes from representation theory (which capture
the asymptotic support of Kronecker coefficients). Our algorithm can be applied
to succinct descriptions of the input tensor whenever the marginals can be
efficiently computed, as in the important case of matrix product states or
tensortrain decompositions, widely used in computational physics and numerical
mathematics.
We strengthen and generalize the alternating minimization approach of
previous papers by introducing the theory of highest weight vectors from
representation theory into the numerical optimization framework. We show that
highest weight vectors are natural potential functions for scaling algorithms
and prove new bounds on their evaluations to obtain polynomialtime
convergence. Our techniques are general and we believe that they will be
instrumental to obtain efficient algorithms for moment polytopes beyond the
ones consider here, and more broadly, for other optimization problems
possessing natural symmetries.

The celebrated BrascampLieb (BL) inequalities (and their extensions) are an
important mathematical tool, unifying and generalizing numerous inequalities in
analysis, convex geometry and information theory. While their structural theory
is very well understood, far less is known about computing their main
parameters.
We give polynomial time algorithms to compute feasibility of BLdatum, the
optimal BLconstant and a weak separation oracle for the BLpolytope. The same
result holds for the socalled Reverse BL inequalities of Barthe. The best
known algorithms for any of these tasks required at least exponential time.
The algorithms are obtained by a simple efficient reduction of a given
BLdatum to an instance of the Operator Scaling problem defined by Gurvits, for
which the present authors have provided a polynomial time algorithm. This
reduction implies algorithmic versions of many of the known structural results,
and in some cases provide proofs that are different or simpler than existing
ones.
Of particular interest is the fact that the operator scaling algorithm is
continuous in its input. Thus as a simple corollary of our reduction we obtain
explicit bounds on the magnitude and continuity of the BLconstant in terms of
the BLdata. To the best of our knowledge no such bounds were known, as past
arguments relied on compactness. The continuity of BLconstants is important
for developing nonlinear BL inequalities that have recently found so many
applications.

We propose a new secondorder method for geodesically convex optimization on
the natural hyperbolic metric over positive definite matrices. We apply it to
solve the operator scaling problem in time polynomial in the input size and
logarithmic in the error. This is an exponential improvement over previous
algorithms which were analyzed in the usual Euclidean, "commutative" metric
(for which the above problem is not convex). Our method is general and
applicable to other settings.
As a consequence, we solve the equivalence problem for the leftright group
action underlying the operator scaling problem. This yields a deterministic
polynomialtime algorithm for a new class of Polynomial Identity Testing (PIT)
problems, which was the original motivation for studying operator scaling.

We develop several efficient algorithms for the classical \emph{Matrix
Scaling} problem, which is used in many diverse areas, from preconditioning
linear systems to approximation of the permanent. On an input $n\times n$
matrix $A$, this problem asks to find diagonal (scaling) matrices $X$ and $Y$
(if they exist), so that $X A Y$ $\varepsilon$approximates a doubly
stochastic, or more generally a matrix with prescribed row and column sums.
We address the general scaling problem as well as some important special
cases. In particular, if $A$ has $m$ nonzero entries, and if there exist $X$
and $Y$ with polynomially large entries such that $X A Y$ is doubly stochastic,
then we can solve the problem in total complexity $\tilde{O}(m + n^{4/3})$.
This greatly improves on the best known previous results, which were either
$\tilde{O}(n^4)$ or $O(m n^{1/2}/\varepsilon)$.
Our algorithms are based on tailormade first and second order techniques,
combined with other recent advances in continuous optimization, which may be of
independent interest for solving similar problems.

In this work we study the quantitative relation between VCdimension and two
other basic parameters related to learning and teaching. Namely, the quality of
sample compression schemes and of teaching sets for classes of low
VCdimension. Let $C$ be a binary concept class of size $m$ and VCdimension
$d$. Prior to this work, the best known upper bounds for both parameters were
$\log(m)$, while the best lower bounds are linear in $d$. We present
significantly better upper bounds on both as follows. Set $k = O(d 2^d \log
\log C)$.
We show that there always exists a concept $c$ in $C$ with a teaching set
(i.e. a list of $c$labeled examples uniquely identifying $c$ in $C$) of size
$k$. This problem was studied by Kuhlmann (1999). Our construction implies that
the recursive teaching (RT) dimension of $C$ is at most $k$ as well. The
RTdimension was suggested by Zilles et al. and Doliwa et al. (2010). The same
notion (under the name partialID width) was independently studied by Wigderson
and Yehudayoff (2013). An upper bound on this parameter that depends only on
$d$ is known just for the very simple case $d=1$, and is open even for $d=2$.
We also make small progress towards this seemingly modest goal.
We further construct sample compression schemes of size $k$ for $C$, with
additional information of $k \log(k)$ bits. Roughly speaking, given any list of
$C$labelled examples of arbitrary length, we can retain only $k$ labeled
examples in a way that allows to recover the labels of all others examples in
the list, using additional $k\log (k)$ information bits. This problem was first
suggested by Littlestone and Warmuth (1986).

We give upper and lower bounds on the power of subsystems of the Ideal Proof
System (IPS), the algebraic proof system recently proposed by Grochow and
Pitassi, where the circuits comprising the proof come from various restricted
algebraic circuit classes. This mimics an established research direction in the
boolean setting for subsystems of Extended Frege proofs, where prooflines are
circuits from restricted boolean circuit classes. Except one, all of the
subsystems considered in this paper can simulate the wellstudied
Nullstellensatz proof system, and prior to this work there were no known lower
bounds when measuring proof size by the algebraic complexity of the polynomials
(except with respect to degree, or to sparsity).
We give two general methods of converting certain algebraic lower bounds into
proof complexity ones. Our methods require stronger notions of lower bounds,
which lower bound a polynomial as well as an entire family of polynomials it
defines. Our techniques are reminiscent of existing methods for converting
boolean circuit lower bounds into related proof complexity results, such as
feasible interpolation. We obtain the relevant types of lower bounds for a
variety of classes (sparse polynomials, depth3 powering formulas, readonce
oblivious algebraic branching programs, and multilinear formulas), and infer
the relevant proof complexity results. We complement our lower bounds by giving
short refutations of the previouslystudied subsetsum axiom using IPS
subsystems, allowing us to conclude strict separations between some of these
subsystems.

The sensitivity of a Boolean function f is the maximum over all inputs x, of
the number of sensitive coordinates of x. The wellknown sensitivity conjecture
of Nisan (see also Nisan and Szegedy) states that every sensitivitys Boolean
function can be computed by a polynomial over the reals of degree poly(s). The
best known upper bounds on degree, however, are exponential rather than
polynomial in s.
Our main result is an approximate version of the conjecture: every Boolean
function with sensitivity s can be epsilonapproximated (in L_2) by a
polynomial whose degree is O(s log(1/epsilon)). This is the first improvement
on the folklore bound of s/epsilon. Further, we show that improving the bound
to O(s^c log(1/epsilon)^d)$ for any d < 1 and any c > 0 will imply the
sensitivity conjecture. Thus our result is essentially the best one can hope
for without proving the conjecture.
We postulate a robust analogue of the sensitivity conjecture: if most inputs
to a Boolean function f have low sensitivity, then most of the Fourier mass of
f is concentrated on small subsets, and present an approach towards proving
this conjecture.

This paper establishes a statistical versus computational tradeoff for
solving a basic highdimensional machine learning problem via a basic convex
relaxation method. Specifically, we consider the {\em Sparse Principal
Component Analysis} (Sparse PCA) problem, and the family of {\em
SumofSquares} (SoS, aka Lasserre/Parillo) convex relaxations. It was well
known that in large dimension $p$, a planted $k$sparse unit vector can be {\em
in principle} detected using only $n \approx k\log p$ (Gaussian or Bernoulli)
samples, but all {\em efficient} (polynomial time) algorithms known require $n
\approx k^2$ samples. It was also known that this quadratic gap cannot be
improved by the the most basic {\em semidefinite} (SDP, aka spectral)
relaxation, equivalent to a degree2 SoS algorithms. Here we prove that also
degree4 SoS algorithms cannot improve this quadratic gap. This averagecase
lower bound adds to the small collection of hardness results in machine
learning for this powerful family of convex relaxation algorithms. Moreover,
our design of moments (or "pseudoexpectations") for this lower bound is quite
different than previous lower bounds. Establishing lower bounds for higher
degree SoS algorithms for remains a challenging problem.

A natural measure of smoothness of a Boolean function is its sensitivity (the
largest number of Hamming neighbors of a point which differ from it in function
value). The structure of smooth or equivalently lowsensitivity functions is
still a mystery. A wellknown conjecture states that every such Boolean
function can be computed by a shallow decision tree. While this conjecture
implies that smooth functions are easy to compute in the simplest computational
model, to date no nontrivial upper bounds were known for such functions in any
computational model, including unrestricted Boolean circuits. Even a bound on
the description length of such functions better than the trivial $2^n$ does not
seem to have been known.
In this work, we establish the first computational upper bounds on smooth
Boolean functions:
1) We show that every sensitivity s function is uniquely specified by its
values on a Hamming ball of radius 2s. We use this to show that such functions
can be computed by circuits of size $n^{O(s)}$.
2) We show that sensitivity s functions satisfy a strong pointwise
noisestability guarantee for random noise of rate O(1/s). We use this to show
that these functions have formulas of depth O(s log n).
3) We show that sensitivity s functions can be (locally) selfcorrected from
worstcase noise of rate $\exp(O(s))$.
All our results are simple, and follow rather directly from (variants of) the
basic fact that that the function value at few points in small neighborhoods of
a given point determine its function value via a majority vote. Our results
confirm various consequences of the conjecture. They may be viewed as providing
a new form of evidence towards its validity, as well as new directions towards
attacking it.

Finding cliques in random graphs and the closely related "planted" clique
variant, where a clique of size k is planted in a random G(n, 1/2) graph, have
been the focus of substantial study in algorithm design. Despite much effort,
the best known polynomialtime algorithms only solve the problem for k ~
sqrt(n).
In this paper we study the complexity of the planted clique problem under
algorithms from the Sumofsquares hierarchy. We prove the first average case
lower bound for this model: for almost all graphs in G(n,1/2), r rounds of the
SOS hierarchy cannot find a planted kclique unless k > n^{1/2r} (up to
logarithmic factors). Thus, for any constant number of rounds planted cliques
of size n^{o(1)} cannot be found by this powerful class of algorithms. This is
shown via an integrability gap for the natural formulation of maximum clique
problem on random graphs for SOS and Lasserre hierarchies, which in turn follow
from degree lower bounds for the Positivestellensatz proof system.
We follow the usual recipe for such proofs. First, we introduce a natural
"dual certificate" (also known as a "vectorsolution" or "pseudoexpectation")
for the given system of polynomial equations representing the problem for every
fixed input graph. Then we show that the matrix associated with this dual
certificate is PSD (positive semidefinite) with high probability over the
choice of the input graph.This requires the use of certain tools. One is the
theory of association schemes, and in particular the eigenspaces and
eigenvalues of the Johnson scheme. Another is a combinatorial method we develop
to compute (via traces) norm bounds for certain random matrices whose entries
are highly dependent; we hope this method will be useful elsewhere.

This paper studies the parameters for which ReedMuller (RM) codes over
$GF(2)$ can correct random erasures and random errors with high probability,
and in particular when can they achieve capacity for these two classical
channels. Necessarily, the paper also studies properties of evaluations of
multivariate $GF(2)$ polynomials on random sets of inputs.
For erasures, we prove that RM codes achieve capacity both for very high rate
and very low rate regimes. For errors, we prove that RM codes achieve capacity
for very low rate regimes, and for very high rates, we show that they can
uniquely decode at about square root of the number of errors at capacity.
The proofs of these four results are based on different techniques, which we
find interesting in their own right. In particular, we study the following
questions about $E(m,r)$, the matrix whose rows are truth tables of all
monomials of degree $\leq r$ in $m$ variables. What is the most (resp. least)
number of random columns in $E(m,r)$ that define a submatrix having full column
rank (resp. full row rank) with high probability? We obtain tight bounds for
very small (resp. very large) degrees $r$, which we use to show that RM codes
achieve capacity for erasures in these regimes.
Our decoding from random errors follows from the following novel reduction.
For every linear code $C$ of sufficiently high rate we construct a new code
$C'$, also of very high rate, such that for every subset $S$ of coordinates, if
$C$ can recover from erasures in $S$, then $C'$ can recover from errors in $S$.
Specializing this to RM codes and using our results for erasures imply our
result on unique decoding of RM codes at high rate.
Finally, two of our capacity achieving results require tight bounds on the
weight distribution of RM codes. We obtain such bounds extending the recent
\cite{KLP} bounds from constant degree to linear degree polynomials.

We prove that 3query linear locally correctable codes over the Reals of
dimension $d$ require block length $n>d^{2+\lambda}$ for some fixed, positive
$\lambda >0$. Geometrically, this means that if $n$ vectors in $R^d$ are such
that each vector is spanned by a linear number of disjoint triples of others,
then it must be that $n > d^{2+\lambda}$. This improves the known quadratic
lower bounds (e.g. {KdW04, Wood07}). While a modest improvement, we expect that
the new techniques introduced in this work will be useful for further progress
on lower bounds of locally correctable and decodable codes with more than 2
queries, possibly over other fields as well.
Our proof introduces several new ideas to existing lower bound techniques,
several of which work over every field. At a high level, our proof has two
parts, {\it clustering} and {\it random restriction}.
The clustering step uses a powerful theorem of Barthe from convex geometry.
It can be used (after preprocessing our LCC to be {\it balanced}), to apply a
basis change (and rescaling) of the vectors, so that the resulting unit vectors
become {\it nearly isotropic}. This together with the fact that any LCC must
have many `correlated' pairs of points, lets us deduce that the vectors must
have a surprisingly strong geometric clustering, and hence also combinatorial
clustering with respect to the spanning triples.
In the restriction step, we devise a new variant of the dimension reduction
technique used in previous lower bounds, which is able to take advantage of the
combinatorial clustering structure above. The analysis of our random projection
method reduces to a simple (weakly) random graph process, and works over any
field.

Finding cliques in random graphs and the closely related "planted" clique
variant, where a clique of size t is planted in a random G(n,1/2) graph, have
been the focus of substantial study in algorithm design. Despite much effort,
the best known polynomialtime algorithms only solve the problem for t =
Theta(sqrt(n)). Here we show that beating sqrt(n) would require substantially
new algorithmic ideas, by proving a lower bound for the problem in the
sumofsquares (or Lasserre) hierarchy, the most powerful class of
semidefinite programming algorithms we know of: r rounds of the sumofsquares
hierarchy can only solve the planted clique for t > sqrt(n)/(C log n)^(r^2).
Previously, no nontrivial lower bounds were known. Our proof is formulated as a
degree lower bound in the Positivstellensatz algebraic proof system, which is
equivalent to the sumofsquares hierarchy. The heart of our (averagecase)
lower bound is a proof that a certain random matrix derived from the input
graph is (with high probability) positive semidefinite. Two ingredients play an
important role in this proof. The first is the classical theory of association
schemes, applied to the average and variance of that random matrix. The second
is a new large deviation inequality for matrixvalued polynomials. Our new tail
estimate seems to be of independent interest and may find other applications,
as it generalizes both the estimates on realvalued polynomials and on sums of
independent random matrices.

We study the rank of complex sparse matrices in which the supports of
different columns have small intersections. The rank of these matrices, called
design matrices, was the focus of a recent work by Barak et. al. (BDWY11) in
which they were used to answer questions regarding point configurations. In
this work we derive nearoptimal rank bounds for these matrices and use them to
obtain asymptotically tight bounds in many of the geometric applications. As a
consequence of our improved analysis, we also obtain a new, linear algebraic,
proof of Kelly's theorem, which is the complex analog of the SylvesterGallai
theorem.

We study questions in incidence geometry where the precise position of points
is `blurry' (e.g. due to noise, inaccuracy or error). Thus lines are replaced
by narrow tubes, and more generally affine subspaces are replaced by their
small neighborhood. We show that the presence of a sufficiently large number of
approximately collinear triples in a set of points in d dimensional complex
space implies that the points are close to a low dimensional affine subspace.
This can be viewed as a stable variant of the SylvesterGallai theorem and its
extensions.
Building on the recently found connection between SylvesterGallai type
theorems and complex Locally Correctable Codes (LCCs), we define the new notion
of stable LCCs, in which the (local) correction procedure can also handle small
perturbations in the euclidean metric. We prove that such stable codes with
constant query complexity do not exist. No impossibility results were known in
any such local setting for more than 2 queries.

A (q,k,t)design matrix is an m x n matrix whose pattern of zeros/nonzeros
satisfies the following designlike condition: each row has at most q
nonzeros, each column has at least k nonzeros and the supports of every two
columns intersect in at most t rows. We prove that the rank of any
(q,k,t)design matrix over a field of characteristic zero (or sufficiently
large finite characteristic) is at least n  (qtn/2k)^2 . Using this result we
derive the following applications:
(1) Impossibility results for 2query LCCs over the complex numbers: A
2query locally correctable code (LCC) is an error correcting code in which
every codeword coordinate can be recovered, probabilistically, by reading at
most two other code positions. Such codes have numerous applications and
constructions (with exponential encoding length) are known over finite fields
of small characteristic. We show that infinite families of such linear 2query
LCCs do not exist over the complex numbers.
(2) Generalization of results in combinatorial geometry: We prove a
quantitative analog of the SylvesterGallai theorem: Let $v_1,...,v_m$ be a set
of points in $\C^d$ such that for every $i \in [m]$ there exists at least
$\delta m$ values of $j \in [m]$ such that the line through $v_i,v_j$ contains
a third point in the set. We show that the dimension of $\{v_1,...,v_m \}$ is
at most $O(1/\delta^2)$. Our results generalize to the high dimensional case
(replacing lines with planes, etc.) and to the case where the points are
colored (as in the MotzkinRabin Theorem).

The main contribution of this work is a new type of graph product, which we
call the {\it zigzag product}. Taking a product of a large graph with a small
graph, the resulting graph inherits (roughly) its size from the large one, its
degree from the small one, and its expansion properties from both! Iteration
yields simple explicit constructions of constantdegree expanders of arbitrary
size, starting from one constantsize expander.
Crucial to our intuition (and simple analysis) of the properties of this
graph product is the view of expanders as functions which act as ``entropy
wave" propagators  they transform probability distributions in which entropy
is concentrated in one area to distributions where that concentration is
dissipated. In these terms, the graph products affords the constructive
interference of two such waves.
Subsequent work [ALW01], [MW01] relates the zigzag product of graphs to the
standard semidirect product of groups, leading to new results and constructions
on expanding Cayley graphs.

We present a simple and general simulation technique that transforms any
blackbox quantum algorithm (a la Grover's database search algorithm) to a
quantum communication protocol for a related problem, in a way that fully
exploits the quantum parallelism. This allows us to obtain new positive and
negative results. The positive results are novel quantum communication
protocols that are built from nontrivial quantum algorithms via this
simulation. These protocols, combined with (old and new) classical lower
bounds, are shown to provide the first asymptotic separation results between
the quantum and classical (probabilistic) twoparty communication complexity
models. In particular, we obtain a quadratic separation for the boundederror
model, and an exponential separation for the zeroerror model. The negative
results transform known quantum communication lower bounds to computational
lower bounds in the blackbox model. In particular, we show that the quadratic
speedup achieved by Grover for the OR function is impossible for the PARITY
function or the MAJORITY function in the boundederror model, nor is it
possible for the OR function itself in the exact case. This dichotomy naturally
suggests a study of boundeddepth predicates (i.e. those in the polynomial
hierarchy) between OR and MAJORITY. We present blackbox algorithms that
achieve near quadratic speed up for all such predicates.