
We say a subset $C \subseteq \{1,2,\dots,k\}^n$ is a $k$hash code (also
called $k$separated) if for every subset of $k$ codewords from $C$, there
exists a coordinate where all these codewords have distinct values.
Understanding the largest possible rate (in bits), defined as $(\log_2 C)/n$,
of a $k$hash code is a classical problem. It arises in two equivalent
contexts: (i) the smallest size possible for a perfect hash family that maps a
universe of $N$ elements into $\{1,2,\dots,k\}$, and (ii) the zeroerror
capacity for decoding with lists of size less than $k$ for a certain
combinatorial channel.
A general upper bound of $k!/k^{k1}$ on the rate of a $k$hash code (in the
limit of large $n$) was obtained by Fredman and Koml\'{o}s in 1984 for any $k
\geq 4$. While better bounds have been obtained for $k=4$, their original bound
has remained the best known for each $k \ge 5$. In this work, we obtain the
first improvement to the FredmanKoml\'{o}s bound for every $k \ge 5$. While we
get explicit (numerical) bounds for $k=5,6$, for larger $k$ we only show that
the FK bound can be improved by a positive, but unspecified, amount. Under a
conjecture on the optimum value of a certain polynomial optimization problem
over the simplex, our methods allow an effective bound to be computed for every
$k$.

We consider the $(\ell_p,\ell_r)$Grothendieck problem, which seeks to
maximize the bilinear form $y^T A x$ for an input matrix $A$ over vectors $x,y$
with $\x\_p=\y\_r=1$. The problem is equivalent to computing the $p \to
r^*$ operator norm of $A$. The case $p=r=\infty$ corresponds to the classical
Grothendieck problem. Our main result is an algorithm for arbitrary $p,r \ge 2$
with approximation ratio $(1+\epsilon_0)/(\sinh^{1}(1)\cdot \gamma_{p^*}
\,\gamma_{r^*})$ for some fixed $\epsilon_0 \le 0.00863$. Comparing this with
Krivine's approximation ratio of $(\pi/2)/\sinh^{1}(1)$ for the original
Grothendieck problem, our guarantee is off from the best known hardness factor
of $(\gamma_{p^*} \gamma_{r^*})^{1}$ for the problem by a factor similar to
Krivine's defect.
Our approximation follows by bounding the value of the natural vector
relaxation for the problem which is convex when $p,r \ge 2$. We give a
generalization of random hyperplane rounding and relate the performance of this
rounding to certain hypergeometric functions, which prescribe necessary
transformations to the vector solution before the rounding is applied. Unlike
Krivine's Rounding where the relevant hypergeometric function was $\arcsin$, we
have to study a family of hypergeometric functions. The bulk of our technical
work then involves methods from complex analysis to gain detailed information
about the Taylor series coefficients of the inverses of these hypergeometric
functions, which then dictate our approximation factor.
Our result also implies improved bounds for "factorization through
$\ell_{2}^{\,n}$" of operators from $\ell_{p}^{\,n}$ to $\ell_{q}^{\,m}$ (when
$p\geq 2 \geq q$) such bounds are of significant interest in functional
analysis and our work provides modest supplementary evidence for an intriguing
parallel between factorizability, and constantfactor approximability.

We study the problem of computing the $p\rightarrow q$ norm of a matrix $A
\in R^{m \times n}$, defined as \[ \A\_{p\rightarrow q} ~:=~ \max_{x \,\in\,
R^n \setminus \{0\}} \frac{\Ax\_q}{\x\_p} \] This problem generalizes the
spectral norm of a matrix ($p=q=2$) and the Grothendieck problem ($p=\infty$,
$q=1$), and has been widely studied in various regimes. When $p \geq q$, the
problem exhibits a dichotomy: constant factor approximation algorithms are
known if $2 \in [q,p]$, and the problem is hard to approximate within almost
polynomial factors when $2 \notin [q,p]$.
The regime when $p < q$, known as \emph{hypercontractive norms}, is
particularly significant for various applications but much less well
understood. The case with $p = 2$ and $q > 2$ was studied by [Barak et al,
STOC'12] who gave subexponential algorithms for a promise version of the
problem (which captures smallset expansion) and also proved hardness of
approximation results based on the Exponential Time Hypothesis. However, no
NPhardness of approximation is known for these problems for any $p < q$.
We study the hardness of approximating matrix norms in both the above cases
and prove the following results:
 We show that for any $1< p < q < \infty$ with $2 \notin [p,q]$,
$\A\_{p\rightarrow q}$ is hard to approximate within
$2^{O(\log^{1\epsilon}\!n)}$ assuming $NP \not\subseteq
BPTIME(2^{\log^{O(1)}\!n})$. This suggests that, similar to the case of $p \geq
q$, the hypercontractive setting may be qualitatively different when $2$ does
not lie between $p$ and $q$.
 For all $p \geq q$ with $2 \in [q,p]$, we show $\A\_{p\rightarrow q}$ is
hard to approximate within any factor than $1/(\gamma_{p^*} \cdot \gamma_q)$,
where for any $r$, $\gamma_r$ denotes the $r^{th}$ norm of a gaussian, and
$p^*$ is the dual norm of $p$.

Ar\i kan's exciting discovery of polar codes has provided an altogether new
way to efficiently achieve Shannon capacity. Given a (constantsized)
invertible matrix $M$, a family of polar codes can be associated with this
matrix and its ability to approach capacity follows from the
$\textit{polarization}$ of an associated $[0,1]$bounded martingale, namely its
convergence in the limit to either $0$ or $1$ with probability $1$. Ar\i kan
showed appropriate polarization of the martingale associated with the matrix
$G_2 = \left( \begin{smallmatrix} 1 & 0 1 & 1 \end{smallmatrix} \right)$ to get
capacity achieving codes. His analysis was later extended to all matrices $M$
which satisfy an obvious necessary condition for polarization.
While Ar\i kan's theorem does not guarantee that the codes achieve capacity
at small blocklengths, it turns out that a "strong" analysis of the
polarization of the underlying martingale would lead to such constructions.
Indeed for the martingale associated with $G_2$ such a strong polarization was
shown in two independent works ([Guruswami and Xia, IEEE IT '15] and [Hassani
et al., IEEE IT '14]), thereby resolving a major theoretical challenge
associated with the efficient attainment of Shannon capacity.
In this work we extend the result above to cover martingales associated with
all matrices that satisfy the necessary condition for (weak) polarization. In
addition to being vastly more general, our proofs of strong polarization are
(in our view) also much simpler and modular. Key to our proof is a notion of
$\textit{local polarization}$ that only depends on the evolution of the
martingale in a single time step. Our result shows strong polarization over all
prime fields and leads to efficient capacityachieving source codes for
compressing arbitrary i.i.d. sources, and capacityachieving channel codes for
arbitrary symmetric memoryless channels.

This paper addresses the problem of constructing MDS codes that enable exact
repair of each code block with small repair bandwidth, which refers to the
total amount of information flow from the remaining code blocks during the
repair process. This problem naturally arises in the context of distributed
storage systems as the node repair problem [7]. The constructions of
exactrepairable MDS codes with optimal repairbandwidth require working with
large subpacketization levels, which restricts their employment in practice.
This paper presents constructions for MDS codes that simultaneously provide
both small repair bandwidth and small subpacketization level. In particular,
this paper presents two general approaches to construct exactrepairable MDS
codes that aim at significantly reducing the required subpacketization level
at the cost of slightly suboptimal repair bandwidth. The first approach gives
MDS codes that have repair bandwidth at most twice the optimal
repairbandwidth. Additionally, these codes also have the smallest possible
subpacketization level $\ell = O(r)$, where $r$ denotes the number of parity
blocks. This approach is then generalized to design codes that have their
repair bandwidth approaching the optimal repairbandwidth at the cost of
graceful increment in the required subpacketization level. The second approach
transforms an MDS code with optimal repairbandwidth and large
subpacketization level into a longer MDS code with small subpacketization
level and nearoptimal repair bandwidth. For a given $r$, the obtained codes
have their subpacketization level scaling logarithmically with the code
length. In addition, the obtained codes require field size only linear in the
code length and ensure load balancing among the intact code blocks in terms of
the information downloaded from these blocks during the exact reconstruction of
a code block.

We give new constructions of two classes of algebraic code families which are
efficiently list decodable with small output list size from a fraction
$1R\epsilon$ of adversarial errors where $R$ is the rate of the code, for any
desired positive constant $\epsilon$. The alphabet size depends only $\epsilon$
and is nearlyoptimal.
The first class of codes are obtained by folding algebraicgeometric codes
using automorphisms of the underlying function field. The list decoding
algorithm is based on a linearalgebraic approach, which pins down the
candidate messages to a subspace with a nice "periodic" structure. The list is
pruned by precoding into a special form of "subspaceevasive" sets, which are
constructed pseudorandomly. Instantiating this construction with the
GarciaStichtenoth function field tower yields codes listdecodable up to a
$1R\epsilon$ error fraction with list size bounded by $O(1/\epsilon)$,
matching the existential bound up to constant factors. The parameters we
achieve are thus quite close to the existential bounds in all three aspects:
errorcorrection radius, alphabet size, and listsize.
The second class of codes are obtained by restricting evaluation points of an
algebraicgeometric code to rational points from a subfield. Once again, the
linearalgebraic approach to list decoding to pin down candidate messages to a
periodic subspace. We develop an alternate approach based on "subspace designs"
to precode messages. Together with the subsequent explicit constructions of
subspace designs, this yields a deterministic construction of an algebraic code
family of rate $R$ with efficient list decoding from $1R\epsilon$ fraction of
errors over a constantsized alphabet. The list size is bounded by a very
slowly growing function of the block length $N$; in particular, it is at most
$O(\log^{(r)} N)$ (the $r$'th iterated logarithm) for any fixed integer $r$.

The communication complexity of many fundamental problems reduces greatly
when the communicating parties share randomness that is independent of the
inputs to the communication task. Natural communication processes (say between
humans) however often involve large amounts of shared correlations among the
communicating players, but rarely allow for perfect sharing of randomness. Can
the communication complexity benefit from shared correlations as well as it
does from shared randomness? This question was considered mainly in the context
of simultaneous communication by Bavarian et al. (ICALP 2014). In this work we
study this problem in the standard interactive setting and give some general
results. In particular, we show that every problem with communication
complexity of $k$ bits with perfectly shared randomness has a protocol using
imperfectly shared randomness with complexity $\exp(k)$ bits. We also show that
this is best possible by exhibiting a promise problem with complexity $k$ bits
with perfectly shared randomness which requires $\exp(k)$ bits when the
randomness is imperfectly shared. Along the way we also highlight some other
basic problems such as compression, and agreement distillation, where shared
randomness plays a central role and analyze the complexity of these problems in
the imperfectly shared randomness model.
The technical highlight of this work is the lower bound that goes into the
result showing the tightness of our general connection. This result builds on
the intuition that communication with imperfectly shared randomness needs to be
less sensitive to its random inputs than communication with perfectly shared
randomness. The formal proof invokes results about the smallset expansion of
the noisy hypercube and an invariance principle to convert this intuition to a
proof, thus giving a new application domain for these fundamental results.

In the random deletion channel, each bit is deleted independently with
probability $p$. For the random deletion channel, the existence of codes of
rate $(1p)/9$, and thus bounded away from $0$ for any $p < 1$, has been known.
We give an explicit construction with polynomial time encoding and deletion
correction algorithms with rate $c_0 (1p)$ for an absolute constant $c_0 > 0$.

We consider binary error correcting codes when errors are deletions. A basic
challenge concerning deletion codes is determining $p_0^{(adv)}$, the zerorate
threshold of adversarial deletions, defined to be the supremum of all $p$ for
which there exists a code family with rate bounded away from 0 capable of
correcting a fraction $p$ of adversarial deletions. A recent construction of
deletioncorrecting codes [Bukh et al 17] shows that $p_0^{(adv)} \ge
\sqrt{2}1$, and the trivial upper bound, $p_0^{(adv)}\le\frac{1}{2}$, is the
best known. Perhaps surprisingly, we do not know whether or not $p_0^{(adv)} =
1/2$.
In this work, to gain further insight into deletion codes, we explore two
related error models: oblivious deletions and online deletions, which are in
between random and adversarial deletions in power. In the oblivious model, the
channel can inflict an arbitrary pattern of $pn$ deletions, picked without
knowledge of the codeword. We prove the existence of binary codes of positive
rate that can correct any fraction $p < 1$ of oblivious deletions, establishing
that the associated zerorate threshold $p_0^{(obliv)}$ equals $1$.
For online deletions, where the channel decides whether to delete bit $x_i$
based only on knowledge of bits $x_1x_2\dots x_i$, define the deterministic
zerorate threshold for online deletions $p_0^{(on,d)}$ to be the supremum of
$p$ for which there exist deterministic codes against an online channel causing
$pn$ deletions with low average probability of error. That is, the probability
that a randomly chosen codeword is decoded incorrectly is small. We prove
$p_0^{(adv)}=\frac{1}{2}$ if and only if $p_0^{(on,d)}=\frac{1}{2}$.

For an $n$variate order$d$ tensor $A$, define $ A_{\max} := \sup_{\ x \_2
= 1} \langle A , x^{\otimes d} \rangle$ to be the maximum value taken by the
tensor on the unit sphere. It is known that for a random tensor with i.i.d $\pm
1$ entries, $A_{\max} \lesssim \sqrt{n\cdot d\cdot\log d}$ w.h.p. We study the
problem of efficiently certifying upper bounds on $A_{\max}$ via the natural
relaxation from the Sum of Squares (SoS) hierarchy. Our results include:
 When $A$ is a random order$q$ tensor, we prove that $q$ levels of SoS
certifies an upper bound $B$ on $A_{\max}$ that satisfies \[ B ~~~~\leq~~
A_{\max} \cdot \biggl(\frac{n}{q^{\,1o(1)}}\biggr)^{q/41/2} \quad
\text{w.h.p.} \] Our upper bound improves a result of Montanari and Richard
(NIPS 2014) when $q$ is large.
 We show the above bound is the best possible up to lower order terms,
namely the optimum of the level$q$ SoS relaxation is at least \[ A_{\max}
\cdot \biggl(\frac{n}{q^{\,1+o(1)}}\biggr)^{q/41/2} \ . \]
 When $A$ is a random order$d$ tensor, we prove that $q$ levels of SoS
certifies an upper bound $B$ on $A_{\max}$ that satisfies \[
B ~~\leq ~~ A_{\max} \cdot \biggl(\frac{\widetilde{O}(n)}{q}\biggr)^{d/4 
1/2} \quad \text{w.h.p.} \] For growing $q$, this improves upon the bound
certified by constant levels of SoS. This answers in part, a question posed by
Hopkins, Shi, and Steurer (COLT 2015), who established the tight
characterization for constant levels of SoS.

In errorcorrecting codes, locality refers to several different ways of
quantifying how easily a small amount of information can be recovered from
encoded data. In this work, we study a notion of locality called the
sDisjointRepairGroup Property (sDRGP). This notion can interpolate between
two very different settings in coding theory: that of Locally Correctable Codes
(LCCs) when s is largea very strong guaranteeand Locally Recoverable
Codes (LRCs) when s is smalla relatively weaker guarantee. This motivates
the study of the sDRGP for intermediate s, which is the focus of our paper. We
construct codes in this parameter regime which have a higher rate than
previously known codes. Our construction is based on a novel variant of the
lifted codes of Guo, Kopparty and Sudan. Beyond the results on the sDRGP, we
hope that our construction is of independent interest, and will find uses
elsewhere.

We consider the following basic problem: given an $n$variate degree$d$
homogeneous polynomial $f$ with real coefficients, compute a unit vector $x \in
\mathbb{R}^n$ that maximizes $f(x)$. Besides its fundamental nature, this
problem arises in diverse contexts ranging from tensor and operator norms to
graph expansion to quantum information theory. The homogeneous degree $2$ case
is efficiently solvable as it corresponds to computing the spectral norm of an
associated matrix, but the higher degree case is NPhard.
We give approximation algorithms for this problem that offer a tradeoff
between the approximation ratio and running time: in $n^{O(q)}$ time, we get an
approximation within factor $O_d((n/q)^{d/21})$ for arbitrary polynomials,
$O_d((n/q)^{d/41/2})$ for polynomials with nonnegative coefficients, and
$O_d(\sqrt{m/q})$ for sparse polynomials with $m$ monomials. The approximation
guarantees are with respect to the optimum of the level$q$ sumofsquares
(SoS) SDP relaxation of the problem. Known polynomial time algorithms for this
problem rely on "decoupling lemmas." Such tools are not capable of offering a
tradeoff like our results as they blow up the number of variables by a factor
equal to the degree. We develop new decoupling tools that are more efficient in
the number of variables at the expense of less structure in the output
polynomials. This enables us to harness the benefits of higher level SoS
relaxations.
We complement our algorithmic results with some polynomially large
integrality gaps, albeit for a slightly weaker (but still very natural)
relaxation. Toward this, we give a method to lift a level$4$ solution matrix
$M$ to a higher level solution, under a mild technical condition on $M$.

Subspace designs are a (large) collection of highdimensional subspaces
$\{H_i\}$ of $\F_q^m$ such that for any lowdimensional subspace $W$, only a
small number of subspaces from the collection have nontrivial intersection
with $W$; more precisely, the sum of dimensions of $W \cap H_i$ is at most some
parameter $L$. The notion was put forth by Guruswami and Xing (STOC'13) with
applications to list decoding variants of ReedSolomon and algebraicgeometric
codes, and later also used for explicit rankmetric codes with optimal list
decoding radius.
Guruswami and Kopparty (FOCS'13, Combinatorica'16) gave an explicit
construction of subspace designs with nearoptimal parameters. This
construction was based on polynomials and has close connections to folded
ReedSolomon codes, and required large field size (specifically $q \ge m$).
Forbes and Guruswami (RANDOM'15) used this construction to give explicit
constant degree "dimension expanders" over large fields, and noted that
subspace designs are a powerful tool in linearalgebraic pseudorandomness.
Here, we construct subspace designs over any field, at the expense of a
modest worsening of the bound $L$ on total intersection dimension. Our approach
is based on a (nontrivial) extension of the polynomialbased construction to
algebraic function fields, and instantiating the approach with cyclotomic
function fields. Plugging in our new subspace designs in the construction of
Forbes and Guruswami yields dimension expanders over $\F^n$ for any field $\F$,
with logarithmic degree and expansion guarantee for subspaces of dimension
$\Omega(n/(\log \log n))$.

A classic result due to Schaefer (1978) classifies all constraint
satisfaction problems (CSPs) over the Boolean domain as being either in
$\mathsf{P}$ or $\mathsf{NP}$hard. This paper considers a promiseproblem
variant of CSPs called PCSPs. A PCSP over a finite set of pairs of constraints
$\Gamma$ consists of a pair $(\Psi_P, \Psi_Q)$ of CSPs with the same set of
variables such that for every $(P, Q) \in \Gamma$, $P(x_{i_1}, ..., x_{i_k})$
is a clause of $\Psi_P$ if and only if $Q(x_{i_1}, ..., x_{i_k})$ is a clause
of $\Psi_Q$. The promise problem $\operatorname{PCSP}(\Gamma)$ is to
distinguish, given $(\Psi_P, \Psi_Q)$, between the cases $\Psi_P$ is
satisfiable and $\Psi_Q$ is unsatisfiable. Many natural problems including
approximate graph and hypergraph coloring can be placed in this framework.
This paper is motivated by the pursuit of understanding the computational
complexity of Boolean promise CSPs. As our main result, we show that
$\operatorname{PCSP}(\Gamma)$ exhibits a dichotomy (it is either polynomial
time solvable or $\mathsf{NP}$hard) when the relations in $\Gamma$ are
symmetric and allow for negations of variables. We achieve our dichotomy
theorem by extending the weak polymorphism framework of Austrin, Guruswami, and
H\aa stad [FOCS '14] which itself is a generalization of the algebraic approach
to study CSPs. In both the algorithm and hardness portions of our proof, we
incorporate new ideas and techniques not utilized in the CSP case.
Furthermore, we show that the computational complexity of any promise CSP
(over arbitrary finite domains) is captured entirely by its weak polymorphisms,
a feature known as Galois correspondence, as well as give necessary and
sufficient conditions for the structure of this set of weak polymorphisms. Such
insights call us to question the existence of a general dichotomy for Boolean
PCSPs.

The ReedMuller (RM) code encoding $n$variate degree$d$ polynomials over
${\mathbb F}_q$ for $d < q$, with its evaluation on ${\mathbb F}_q^n$, has
relative distance $1d/q$ and can be list decoded from a $1O(\sqrt{d/q})$
fraction of errors. In this work, for $d \ll q$, we give a lengthefficient
puncturing of such codes which (almost) retains the distance and list
decodability properties of the ReedMuller code, but has much better rate.
Specificially, when $q =\Omega( d^2/\epsilon^2)$, we given an explicit rate
$\Omega\left(\frac{\epsilon}{d!}\right)$ puncturing of ReedMuller codes which
have relative distance at least $(1\epsilon)$ and efficient list decoding up
to $(1\sqrt{\epsilon})$ error fraction. This almost matches the performance of
random puncturings which work with the weaker field size requirement $q=
\Omega( d/\epsilon^2)$. We can also improve the field size requirement to the
optimal (up to constant factors) $q =\Omega( d/\epsilon)$, at the expense of a
worse list decoding radius of $1\epsilon^{1/3}$ and rate
$\Omega\left(\frac{\epsilon^2}{d!}\right)$.
The first of the above tradeoffs is obtained by substituting for the
variables functions with carefully chosen pole orders from an algebraic
function field; this leads to a puncturing for which the RM code is a subcode
of a certain algebraicgeometric code (which is known to be efficiently list
decodable). The second tradeoff is obtained by concatenating this construction
with a ReedSolomon based multiplication friendly pair, and using the list
recovery property of algebraicgeometric codes.

We study the performance of ReedSolomon (RS) codes for the \em exact repair
problem \em in distributed storage. Our main result is that, in some parameter
regimes, ReedSolomon codes are optimal regenerating codes, among MDS codes
with linear repair schemes. Moreover, we give a characterization of MDS codes
with linear repair schemes which holds in any parameter regime, and which can
be used to give nontrivial repair schemes for RS codes in other settings.
More precisely, we show that for $k$dimensional RS codes whose evaluation
points are a finite field of size $n$, there are exact repair schemes with
bandwidth $(n1)\log((n1)/(nk))$ bits, and that this is optimal for any MDS
code with a linear repair scheme. In contrast, the naive (commonly implemented)
repair algorithm for this RS code has bandwidth $k\log(n)$ bits. When the
entire field is used as evaluation points, the number of nodes $n$ is much
larger than the number of bits per node (which is $O(\log(n))$), and so this
result holds only when the degree of subpacketization is small. However, our
method applies in any parameter regime, and to illustrate this for high levels
of subpacketization we give an improved repair scheme for a specific
(14,10)RS code used in the Facebook Hadoop Analytics cluster.

An $(n, M)$ vector code $\mathcal{C} \subseteq \mathbb{F}^n$ is a collection
of $M$ codewords where $n$ elements (from the field $\mathbb{F}$) in each of
the codewords are referred to as code blocks. Assuming that $\mathbb{F} \cong
\mathbb{B}^{\ell}$, the code blocks are treated as $\ell$length vectors over
the base field $\mathbb{B}$. Equivalently, the code is said to have the
subpacketization level $\ell$. This paper addresses the problem of
constructing MDS vector codes which enable exact reconstruction of each code
block by downloading small amount of information from the remaining code
blocks. The repair bandwidth of a code measures the information flow from the
remaining code blocks during the reconstruction of a single code block. This
problem naturally arises in the context of distributed storage systems as the
node repair problem [4]. Assuming that $M = \mathbb{B}^{k\ell}$, the repair
bandwidth of an MDS vector code is lower bounded by $\big(\frac{n  1}{n 
k}\big)\cdot \ell$ symbols (over the base field $\mathbb{B}$) which is also
referred to as the cutset bound [4]. For all values of $n$ and $k$, the MDS
vector codes that attain the cutset bound with the subpacketization level
$\ell = (nk)^{\lceil{{n}/{(nk)}}\rceil}$ are known in the literature [23,
35].
This paper presents a construction for MDS vector codes which simultaneously
ensures both small repair bandwidth and small subpacketization level. The
obtained codes have the smallest possible subpacketization level $\ell = O(n 
k)$ for an MDS vector code and the repair bandwidth which is at most twice the
cutset bound. The paper then generalizes this code construction so that the
repair bandwidth of the obtained codes approach the cutset bound at the cost
of increased subpacketization level. The constructions presented in this paper
give MDS vector codes which are linear over the base field $\mathbb{B}$.

This work constructs codes that are efficiently decodable from a constant
fraction of \emph{worstcase} insertion and deletion errors in three parameter
settings: (i) Binary codes with rate approaching 1; (ii) Codes with constant
rate for error fraction approaching 1 over fixed alphabet size; and (iii)
Constant rate codes over an alphabet of size $k$ for error fraction approaching
$(k1)/(k+1)$. When errors are constrained to deletions alone, efficiently
decodable codes in each of these regimes were constructed recently. We complete
the picture by constructing similar codes that are efficiently decodable in the
insertion/deletion regime.

We survey existing techniques to bound the mixing time of Markov chains. The
mixing time is related to a geometric parameter called conductance which is a
measure of edgeexpansion. Bounds on conductance are typically obtained by a
technique called "canonical paths" where the idea is to find a set of paths,
one between every sourcedestination pair, such that no edge is heavily
congested. However, the canonical paths approach cannot always show rapid
mixing of a rapidly mixing chain. This drawback disappears if we allow the flow
between a pair of states to be spread along multiple paths. We prove that for a
large class of Markov chains canonical paths does capture rapid mixing.
Allowing multiple paths to route the flow still does help a great deal in
proofs, as illustrated by a result of Morris & Sinclair (FOCS'99) on the rapid
mixing of a Markov chain for sampling 0/1 knapsack solutions.
A different approach to prove rapid mixing is "Coupling". Path Coupling is a
variant discovered by Bubley & Dyer (FOCS'97) that often tremendously reduces
the complexity of designing good Couplings. We present several applications of
Path Coupling in proofs of rapid mixing. These invariably lead to much better
bounds on mixing time than known using conductance, and moreover Coupling based
proofs are typically simpler. This motivates the question of whether Coupling
can be made to work whenever the chain is rapidly mixing. This question was
answered in the negative by Kumar & Ramesh (FOCS'99), who showed that no
Coupling strategy can prove the rapid mixing of the JerrumSinclair chain for
sampling perfect and nearperfect matchings.

We consider codes over fixed alphabets against worstcase symbol deletions.
For any fixed $k \ge 2$, we construct a family of codes over alphabet of size
$k$ with positive rate, which allow efficient recovery from a worstcase
deletion fraction approaching $1\frac{2}{k+\sqrt k}$. In particular, for
binary codes, we are able to recover a fraction of deletions approaching
$1/(\sqrt 2 +1)=\sqrt 21 \approx 0.414$. Previously, even nonconstructively
the largest deletion fraction known to be correctable with positive rate was
$1\Theta(1/\sqrt{k})$, and around $0.17$ for the binary case.
Our result pins down the largest fraction of correctable deletions for
$k$ary codes as $1\Theta(1/k)$, since $11/k$ is an upper bound even for the
simpler model of erasures where the locations of the missing symbols are known.
Closing the gap between $(\sqrt 2 1)$ and $1/2$ for the limit of worstcase
deletions correctable by binary codes remains a tantalizing open question.

We prove $n^{1+\Omega(1/p)}/p^{O(1)}$ lower bounds for the space complexity
of $p$pass streaming algorithms solving the following problems on $n$vertex
graphs:
* testing if an undirected graph has a perfect matching (this implies lower
bounds for computing a maximum matching or even just the maximum matching
size),
* testing if two specific vertices are at distance at most $2(p+1)$ in an
undirected graph,
* testing if there is a directed path from $s$ to $t$ for two specific
vertices $s$ and $t$ in a directed graph.
Prior to our result, it was known that these problems require $\Omega(n^2)$
space in one pass, but no $n^{1+\Omega(1)}$ lower bound was known for any $p\ge
2$.
These streaming results follow from a communication complexity lower bound
for a communication game in which the players hold two graphs on the same set
of vertices. The task of the players is to find out whether the sets of
vertices at distance exactly $p+1$ from a specific vertex intersect. The game
requires a significant amount of communication only if the players are forced
to speak in a specific difficult order. This is reminiscent of lower bounds for
communication problems such as indexing and pointer chasing. Among other
things, our line of attack requires proving an information cost lower bound for
a decision version of the classic pointer chasing problem and a direct sum type
theorem for the disjunction of several instances of this problem.

We consider the problem of constructing binary codes to recover from $k$bit
deletions with efficient encoding/decoding, for a fixed $k$. The single
deletion case is well understood, with the VarshamovTenengoltsLevenshtein
code from 1965 giving an asymptotically optimal construction with $\approx
2^n/n$ codewords of length $n$, i.e., at most $\log n$ bits of redundancy.
However, even for the case of two deletions, there was no known explicit
construction with redundancy less than $n^{\Omega(1)}$.
For any fixed $k$, we construct a binary code with $c_k \log n$ redundancy
that can be decoded from $k$ deletions in $O_k(n \log^4 n)$ time. The
coefficient $c_k$ can be taken to be $O(k^2 \log k)$, which is only
quadratically worse than the optimal, nonconstructive bound of $O(k)$. We also
indicate how to modify this code to allow for a combination of up to $k$
insertions and deletions.

A hypergraph is said to be $\chi$colorable if its vertices can be colored
with $\chi$ colors so that no hyperedge is monochromatic. $2$colorability is a
fundamental property (called Property B) of hypergraphs and is extensively
studied in combinatorics. Algorithmically, however, given a $2$colorable
$k$uniform hypergraph, it is NPhard to find a $2$coloring miscoloring fewer
than a fraction $2^{k+1}$ of hyperedges (which is achieved by a random
$2$coloring), and the best algorithms to color the hypergraph properly require
$\approx n^{11/k}$ colors, approaching the trivial bound of $n$ as $k$
increases.
In this work, we study the complexity of approximate hypergraph coloring, for
both the maximization (finding a $2$coloring with fewest miscolored edges) and
minimization (finding a proper coloring using fewest number of colors)
versions, when the input hypergraph is promised to have the following stronger
properties than $2$colorability:
(A) Lowdiscrepancy: If the hypergraph has discrepancy $\ell \ll \sqrt{k}$,
we give an algorithm to color the it with $\approx n^{O(\ell^2/k)}$ colors.
However, for the maximization version, we prove NPhardness of finding a
$2$coloring miscoloring a smaller than $2^{O(k)}$ (resp. $k^{O(k)}$)
fraction of the hyperedges when $\ell = O(\log k)$ (resp. $\ell=2$). Assuming
the UGC, we improve the latter hardness factor to $2^{O(k)}$ for almost
discrepancy$1$ hypergraphs.
(B) Rainbow colorability: If the hypergraph has a $(k\ell)$coloring such
that each hyperedge is polychromatic with all these colors, we give a
$2$coloring algorithm that miscolors at most $k^{\Omega(k)}$ of the
hyperedges when $\ell \ll \sqrt{k}$, and complement this with a matching UG
hardness result showing that when $\ell =\sqrt{k}$, it is hard to even beat the
$2^{k+1}$ bound achieved by a random coloring.

Given an undirected graph $G = (V_G, E_G)$ and a fixed "pattern" graph $H =
(V_H, E_H)$ with $k$ vertices, we consider the $H$Transversal and $H$Packing
problems. The former asks to find the smallest $S \subseteq V_G$ such that the
subgraph induced by $V_G \setminus S$ does not have $H$ as a subgraph, and the
latter asks to find the maximum number of pairwise disjoint $k$subsets $S_1,
..., S_m \subseteq V_G$ such that the subgraph induced by each $S_i$ has $H$ as
a subgraph.
We prove that if $H$ is 2connected, $H$Transversal and $H$Packing are
almost as hard to approximate as general $k$Hypergraph Vertex Cover and
$k$Set Packing, so it is NPhard to approximate them within a factor of
$\Omega (k)$ and $\widetilde \Omega (k)$ respectively. We also show that there
is a 1connected $H$ where $H$Transversal admits an $O(\log k)$approximation
algorithm, so that the connectivity requirement cannot be relaxed from 2 to 1.
For a special case of $H$Transversal where $H$ is a (family of) cycles, we
mention the implication of our result to the related Feedback Vertex Set
problem, and give a different hardness proof for directed graphs.

An emerging theory of "linearalgebraic pseudorandomness" aims to understand
the linearalgebraic analogs of fundamental Boolean pseudorandom objects where
the rank of subspaces plays the role of the size of subsets. In this work, we
study and highlight the interrelationships between several such algebraic
objects such as subspace designs, dimension expanders, seeded rank condensers,
twosource rank condensers, and rankmetric codes. In particular, with the
recent construction of nearoptimal subspace designs by Guruswami and Kopparty
as a starting point, we construct good (seeded) rank condensers (both lossless
and lossy versions), which are a small collection of linear maps $\mathbb{F}^n
\to \mathbb{F}^t$ for $t \ll n$ such that for every subset of $\mathbb{F}^n$ of
small rank, its rank is preserved (up to a constant factor in the lossy case)
by at least one of the maps.
We then compose a tensoring operation with our lossy rank condenser to
construct constantdegree dimension expanders over polynomially large fields.
That is, we give $O(1)$ explicit linear maps $A_i:\mathbb{F}^n\to \mathbb{F}^n$
such that for any subspace $V \subseteq \mathbb{F}^n$ of dimension at most
$n/2$, $\dim\bigl( \sum_i A_i(V)\bigr) \ge (1+\Omega(1)) \dim(V)$. Previous
constructions of such constantdegree dimension expanders were based on
Kazhdan's property $T$ (for the case when $\mathbb{F}$ has characteristic zero)
or monotone expanders (for every field $\mathbb{F}$); in either case the
construction was harder than that of usual vertex expanders. Our construction,
on the other hand, is simpler.
Via an equivalence to linear rankmetric codes, we then construct optimal
lossless twosource condensers. We then use our seeded rank condensers to
obtain nearoptimal lossy twosource condensers for constant rank sources.