
We eliminate a key roadblock to efficient verification of nonlinear integer
arithmetic using CDCL SAT solvers, by showing how to construct short resolution
proofs for many properties of the most widely used multiplier circuits. Such
short proofs were conjectured not to exist. More precisely, we give n^{O(1)}
size regular resolution proofs for arbitrary degree 2 identities on array,
diagonal, and Booth multipliers and quasipolynomial n^{O(\log n)} size proofs
for these identities on Wallace tree multipliers.

We introduce and develop a new semialgebraic proof system, called Stabbing
Planes that is in the style of DPLLbased modern SAT solvers. As with DPLL,
there is only one rule: the current polytope can be subdivided by branching on
an inequality and its "integer negation." That is, we can (nondeterministically
choose) a hyperplane a x \geq b with integer coefficients, which partitions the
polytope into three pieces: the points in the polytope satisfying a x \geq b,
the points satisfying a x \leq b1, and the middle slab b1 < a x < b. Since
the middle slab contains no integer points it can be safely discarded, and the
algorithm proceeds recursively on the other two branches. Each path terminates
when the current polytope is empty, which is polynomialtime checkable. Among
our results, we show somewhat surprisingly that Stabbing Planes can efficiently
simulate Cutting Planes, and moreover, is strictly stronger than Cutting Planes
under a reasonable conjecture. We prove linear lower bounds on the rank of
Stabbing Planes refutations, by adapting a lifting argument in communication
complexity.

We develop an extension of recently developed methods for obtaining
timespace tradeoff lower bounds for problems of learning from random test
samples to handle the situation where the space of tests is signficantly
smaller than the space of inputs, a class of learning problems that is not
handled by prior work. This extension is based on a measure of how matrices
amplify the 2norms of probability distributions that is more refined than the
2norms of these matrices.
As applications that follow from our new technique, we show that any
algorithm that learns $m$variate homogeneous polynomial functions of degree at
most $d$ over $\mathbb{F}_2$ from evaluations on randomly chosen inputs either
requires space $\Omega(mn)$ or $2^{\Omega(m)}$ time where $n=m^{\Theta(d)}$ is
the dimension of the space of such functions. These bounds are asymptotically
optimal since they match the tradeoffs achieved by natural learning algorithms
for the problems.

We study distributed protocols for finding all pairs of similar vectors in a
large dataset. Our results pertain to a variety of discrete metrics, and we
give concrete instantiations for Hamming distance. In particular, we give
improved upper bounds on the overhead required for similarity defined by
Hamming distance $r>1$ and prove a lower bound showing qualitative optimality
of the overhead required for similarity over any Hamming distance $r$. Our main
conceptual contribution is a connection between similarity search algorithms
and certain graphtheoretic quantities. For our upper bounds, we exhibit a
general method for designing oneround protocols using edgeisoperimetric
shapes in similarity graphs. For our lower bounds, we define a new
combinatorial optimization problem, which can be stated in purely
graphtheoretic terms yet also captures the core of the analysis in previous
theoretical work on distributed similarity joins. As one of our main technical
results, we prove new bounds on distance correlations in subsets of the Hamming
cube.

A formulation of "Ne\v{c}iporuk's lower bound method" slightly more inclusive
than the usual complexitymeasurespecific formulation is presented. Using this
general formulation, limitations to lower bounds achievable by the method are
obtained for several computation models, such as branching programs and Boolean
formulas having access to a sublinear number of nondeterministic bits. In
particular, it is shown that any lower bound achievable by the method of
Ne\v{c}iporuk for the size of nondeterministic and parity branching programs is
at most $O(n^{3/2}/\log n)$.

In this paper, we study the communication complexity for the problem of
computing a conjunctive query on a large database in a parallel setting with
$p$ servers. In contrast to previous work, where upper and lower bounds on the
communication were specified for particular structures of data (either data
without skew, or data with specific types of skew), in this work we focus on
worstcase analysis of the communication cost. The goal is to find worstcase
optimal parallel algorithms, similar to the work of [18] for sequential
algorithms.
We first show that for a single round we can obtain an optimal worstcase
algorithm. The optimal load for a conjunctive query $q$ when all relations have
size equal to $M$ is $O(M/p^{1/\psi^*})$, where $\psi^*$ is a new queryrelated
quantity called the edge quasipacking number, which is different from both the
edge packing number and edge cover number of the query hypergraph. For multiple
rounds, we present algorithms that are optimal for several classes of queries.
Finally, we show a surprising connection to the external memory model, which
allows us to translate parallel algorithms to external memory algorithms. This
technique allows us to recover (within a polylogarithmic factor) several recent
results on the I/O complexity for computing join queries, and also obtain
optimal algorithms for other classes of queries.

We study the problem of computing conjunctive queries over large databases on
parallel architectures without shared storage. Using the structure of such a
query $q$ and the skew in the data, we study tradeoffs between the number of
processors, the number of rounds of communication, and the perprocessor load
 the number of bits each processor can send or can receive in a single round
 that are required to compute $q$.
When the data is free of skew, we obtain essentially tight upper and lower
bounds for one round algorithms and we show how the bounds degrade when there
is skew in the data. In the case of skewed data, we show how to improve the
algorithms when approximate degrees of the heavyhitter elements are available,
obtaining essentially optimal algorithms for queries such as simple joins and
triangle join queries.
For queries that we identify as treelike, we also prove nearly matching
upper and lower bounds for multiround algorithms for a natural class of
skewfree databases. One consequence of these latter lower bounds is that for
any $\varepsilon>0$, using $p$ processors to compute the connected components
of a graph, or to output the path, if any, between a specified pair of vertices
of a graph with $m$ edges and perprocessor load that is
$O(m/p^{1\varepsilon})$ requires $\Omega(\log p)$ rounds of communication.
Our upper bounds are given by simple structured algorithms using MapReduce.
Our oneround lower bounds are proved in a very general model, which we call
the Massively Parallel Communication (MPC) model, that allows processors to
communicate arbitrary bits. Our multiround lower bounds apply in a restricted
version of the MPC model in which processors in subsequent rounds after the
first communication round are only allowed to send tuples.

We show new limits on the efficiency of using current techniques to make
exact probabilistic inference for large classes of natural problems. In
particular we show new lower bounds on knowledge compilation to SDD and DNNF
forms. We give strong lower bounds on the complexity of SDD representations by
relating SDD size to bestpartition communication complexity. We use this
relationship to prove exponential lower bounds on the SDD size for representing
a large class of problems that occur naturally as queries over probabilistic
databases. A consequence is that for representing unions of conjunctive
queries, SDDs are not qualitatively more concise than OBDDs. We also derive
simple examples for which SDDs must be exponentially less concise than FBDDs.
Finally, we derive exponential lower bounds on the sizes of DNNF
representations using a new quasipolynomial simulation of DNNFs by
nondeterministic FBDDs.

The FO Model Counting problem (FOMC) is the following: given a sentence
$\Phi$ in FO and a number $n$, compute the number of models of $\Phi$ over a
domain of size $n$; the Weighted variant (WFOMC) generalizes the problem by
associating a weight to each tuple and defining the weight of a model to be the
product of weights of its tuples. In this paper we study the complexity of the
symmetric WFOMC, where all tuples of a given relation have the same weight. Our
motivation comes from an important application, inference in Knowledge Bases
with soft constraints, like Markov Logic Networks, but the problem is also of
independent theoretical interest. We study both the data complexity, and the
combined complexity of FOMC and WFOMC. For the data complexity we prove the
existence of an FO$^{3}$ formula for which FOMC is #P$_1$complete, and the
existence of a Conjunctive Query for which WFOMC is #P$_1$complete. We also
prove that all $\gamma$acyclic queries have polynomial time data complexity.
For the combined complexity, we prove that, for every fragment FO$^{k}$, $k\geq
2$, the combined complexity of FOMC (or WFOMC) is #Pcomplete.

We prove that any oblivious algorithm using space $S$ to find the median of a
list of $n$ integers from $\{1,...,2n\}$ requires time $\Omega(n \log\log_S
n)$. This bound also applies to the problem of determining whether the median
is odd or even. It is nearly optimal since Chan, following Munro and Raman, has
shown that there is a (randomized) selection algorithm using only $s$
registers, each of which can store an input value or $O(\log n)$bit counter,
that makes only $O(\log\log_s n)$ passes over the input. The bound also implies
a size lower bound for readonce branching programs computing the low order bit
of the median and implies the analog of $P \ne NP \cap coNP$ for length $o(n
\log\log n)$ oblivious branching programs.

We study the problem of computing a conjunctive query q in parallel, using p
of servers, on a large database. We consider algorithms with one round of
communication, and study the complexity of the communication. We are especially
interested in the case where the data is skewed, which is a major challenge for
scalable parallel query processing. We establish a tight connection between the
fractional edge packings of the query and the amount of communication, in two
cases. First, in the case when the only statistics on the database are the
cardinalities of the input relations, and the data is skewfree, we provide
matching upper and lower bounds (up to a poly log p factor) expressed in terms
of fractional edge packings of the query q. Second, in the case when the
relations are skewed and the heavy hitters and their frequencies are known, we
provide upper and lower bounds (up to a poly log p factor) expressed in terms
of packings of residual queries obtained by specializing the query to a heavy
hitter. All our lower bounds are expressed in the strongest form, as number of
bits needed to be communicated between processors with unlimited computational
power. Our results generalizes some prior results on uniform databases (where
each relation is a matching) [4], and other lower bounds for the MapReduce
model [1].

Query evaluation in tupleindependent probabilistic databases is the problem
of computing the probability of an answer to a query given independent
probabilities of the individual tuples in a database instance. There are two
main approaches to this problem: (1) in `grounded inference' one first obtains
the lineage for the query and database instance as a Boolean formula, then
performs weighted model counting on the lineage (i.e., computes the probability
of the lineage given probabilities of its independent Boolean variables); (2)
in methods known as `lifted inference' or `extensional query evaluation', one
exploits the highlevel structure of the query as a firstorder formula.
Although it is widely believed that lifted inference is strictly more powerful
than grounded inference on the lineage alone, no formal separation has
previously been shown for query evaluation. In this paper we show such a formal
separation for the first time.
We exhibit a class of queries for which model counting can be done in
polynomial time using extensional query evaluation, whereas the algorithms used
in stateoftheart exact model counters on their lineages provably require
exponential time. Our lower bounds on the running times of these exact model
counters follow from new exponential size lower bounds on the kinds of dDNNF
representations of the lineages that these model counters (either explicitly or
implicitly) produce. Though some of these queries have been studied before, no
nontrivial lower bounds on the sizes of these representations for these
queries were previously known.

The best current methods for exactly computing the number of satisfying
assignments, or the satisfying probability, of Boolean formulas can be seen,
either directly or indirectly, as building 'decisionDNNF' (decision
decomposable negation normal form) representations of the input Boolean
formulas. DecisionDNNFs are a special case of 'dDNNF's where 'd' stands for
'deterministic'. We show that any decisionDNNF can be converted into an
equivalent 'FBDD' (free binary decision diagram)  also known as a 'readonce
branching program' (ROBP or 1BP)  with only a quasipolynomial increase in
representation size in general, and with only a polynomial increase in size in
the special case of monotone kDNF formulas. Leveraging known exponential lower
bounds for FBDDs, we then obtain similar exponential lower bounds for
decisionDNNFs which provide lower bounds for the recent algorithms. We also
separate the power of decisionDNNFs from dDNNFs and a generalization of
decisionDNNFs known as ANDFBDDs. Finally we show how these imply exponential
lower bounds for natural problems associated with probabilistic databases.

We consider timespace tradeoffs for exactly computing frequency moments and
order statistics over sliding windows. Given an input of length 2n1, the task
is to output the function of each window of length n, giving n outputs in
total. Computations over sliding windows are related to direct sum problems
except that inputs to instances almost completely overlap.
We show an average case and randomized timespace tradeoff lower bound of TS
in Omega(n^2) for multiway branching programs, and hence standard RAM and
wordRAM models, to compute the number of distinct elements, F_0, in sliding
windows over alphabet [n]. The same lower bound holds for computing the
loworder bit of F_0 and computing any frequency moment F_k for k not equal to
1. We complement this lower bound with a TS in \tilde O(n^2) deterministic RAM
algorithm for exactly computing F_k in sliding windows.
We show timespace separations between the complexity of slidingwindow
element distinctness and that of slidingwindow $F_0\bmod 2$ computation. In
particular for alphabet [n] there is a very simple errorless slidingwindow
algorithm for element distinctness that runs in O(n) time on average and uses
O(log{n}) space.
We show that any algorithm for a single element distinctness instance can be
extended to an algorithm for the slidingwindow version of element distinctness
with at most a polylogarithmic increase in the timespace product.
Finally, we show that the slidingwindow computation of order statistics such
as the maximum and minimum can be computed with only a logarithmic increase in
time, but that a TS in Omega(n^2) lower bound holds for slidingwindow
computation of order statistics such as the median, a nearly linear increase in
time when space is small.

We derive new timespace tradeoff lower bounds and algorithms for exactly
computing statistics of input data, including frequency moments, element
distinctness, and order statistics, that are simple to calculate for sorted
data. We develop a randomized algorithm for the element distinctness problem
whose time T and space S satisfy T in O (n^{3/2}/S^{1/2}), smaller than
previous lower bounds for comparisonbased algorithms, showing that element
distinctness is strictly easier than sorting for randomized branching programs.
This algorithm is based on a new time and space efficient algorithm for finding
all collisions of a function f from a finite set to itself that are reachable
by iterating f from a given set of starting points. We further show that our
element distinctness algorithm can be extended at only a polylogarithmic factor
cost to solve the element distinctness problem over sliding windows, where the
task is to take an input of length 2n1 and produce an output for each window
of length n, giving n outputs in total. In contrast, we show a timespace
tradeoff lower bound of T in Omega(n^2/S) for randomized branching programs to
compute the number of distinct elements over sliding windows. The same lower
bound holds for computing the loworder bit of F_0 and computing any frequency
moment F_k, k neq 1. This shows that those frequency moments and the decision
problem F_0 mod 2 are strictly harder than element distinctness. We complement
this lower bound with a T in O(n^2/S) comparisonbased deterministic RAM
algorithm for exactly computing F_k over sliding windows, nearly matching both
our lower bound for the slidingwindow version and the comparisonbased lower
bounds for the singlewindow version. We further exhibit a quantum algorithm
for F_0 over sliding windows with T in O(n^{3/2}/S^{1/2}). Finally, we consider
the computations of order statistics over sliding windows.

We consider the problem of computing a relational query $q$ on a large input
database of size $n$, using a large number $p$ of servers. The computation is
performed in rounds, and each server can receive only $O(n/p^{1\varepsilon})$
bits of data, where $\varepsilon \in [0,1]$ is a parameter that controls
replication. We examine how many global communication steps are needed to
compute $q$. We establish both lower and upper bounds, in two settings. For a
single round of communication, we give lower bounds in the strongest possible
model, where arbitrary bits may be exchanged; we show that any algorithm
requires $\varepsilon \geq 11/\tau^*$, where $\tau^*$ is the fractional vertex
cover of the hypergraph of $q$. We also give an algorithm that matches the
lower bound for a specific class of databases. For multiple rounds of
communication, we present lower bounds in a model where routing decisions for a
tuple are tuplebased. We show that for the class of treelike queries there
exists a tradeoff between the number of rounds and the space exponent
$\varepsilon$. The lower bounds for multiple rounds are the first of their
kind. Our results also imply that transitive closure cannot be computed in O(1)
rounds of communication.

We show that any quantum algorithm deciding whether an input function $f$
from $[n]$ to $[n]$ is 2to1 or almost 2to1 requires $\Theta(n)$ queries to
$f$. The same lower bound holds for determining whether or not a function $f$
from $[2n2]$ to $[n]$ is surjective. These results yield a nearly linear
$\Omega(n/\log n)$ lower bound on the quantum query complexity of $\cl{AC}^0$.
The best previous lower bound known for any $\cl{AC^0}$ function was the
$\Omega ((n/\log n)^{2/3})$ bound given by Aaronson and Shi's $\Omega(n^{2/3})$
lower bound for the element distinctness problem.

We present a general method for converting any family of unsatisfiable CNF
formulas that is hard for one of the simplest proof systems, tree resolution,
into formulas that require large rank in any proof system that manipulates
polynomials or polynomial threshold functions of degree at most k (known as
Th(k) proofs). Such systems include LovaszSchrijver and Cutting Planes proof
systems as well as their high degree analogues.
These are based on analyzing two new proof systems, denoted by T^cc(k) and
R^cc(k). The proof lines of T^cc(k) are arbitrary Boolean functions, each of
which can be evaluated by an efficient kparty randomized communication
protocol. They include Th{k1} proofs as a special case. R^cc(k) proofs are
stronger and only require that each inference be locally checkable by an
efficient kparty randomized communication protocol.
Our main results are the following:
(1) When k is O(loglogn), for any unsatisfiable CNF formula F requiring
resolution rank r, there is a related CNF formula G=Lift_k(F) requiring
refutation rank r^Omega(1/k) log^O(1) n in all R^cc(k) systems.
(2) There are strict hierarchies for T^cc(k) and R^cc(k) systems with respect
to k when k is O(loglogn in that there are unsatisfiable CNF formulas requiring
large rank R^cc(k) refutations but having log^O(1) n rank Th(k) refutations.
(3) When k is O(loglogn) there are 2^(n^Omega(1/k)) lower bounds on the size
of treelike T^cc(k) refutations for large classes of lifted CNF formulas.
(4) A general method for producing integrality gaps for low rank R^cc(2)
inference (and hence Cutting Planes and Th(1) inference) based on related gaps
for low rank resolution. These gaps are optimal for MAX2tSAT.

The sequence a_1,...,a_m is a common subsequence in the set of permutations S
= {p_1,...,p_k} on [n] if it is a subsequence of p_i(1),...,p_i(n) and
p_j(1),...,p_j(n) for some distinct p_i, p_j in S. Recently, Beame and
HuynhNgoc (2008) showed that when k>=3, every set of k permutations on [n] has
a common subsequence of length at least n^{1/3}.
We show that, surprisingly, this lower bound is asymptotically optimal for
all constant values of k. Specifically, we show that for any k>=3 and n>=k^2
there exists a set of k permutations on [n] in which the longest common
subsequence has length at most 32(kn)^{1/3}. The proof of the upper bound is
constructive, and uses elementary algebraic techniques.