• ### Beating Fredman-Koml\'{o}s for perfect $k$-hashing(1805.04151)

May 10, 2018 math.CO, cs.IT, math.IT, cs.DM, cs.DS
We say a subset $C \subseteq \{1,2,\dots,k\}^n$ is a $k$-hash code (also called $k$-separated) if for every subset of $k$ codewords from $C$, there exists a coordinate where all these codewords have distinct values. Understanding the largest possible rate (in bits), defined as $(\log_2 |C|)/n$, of a $k$-hash code is a classical problem. It arises in two equivalent contexts: (i) the smallest size possible for a perfect hash family that maps a universe of $N$ elements into $\{1,2,\dots,k\}$, and (ii) the zero-error capacity for decoding with lists of size less than $k$ for a certain combinatorial channel. A general upper bound of $k!/k^{k-1}$ on the rate of a $k$-hash code (in the limit of large $n$) was obtained by Fredman and Koml\'{o}s in 1984 for any $k \geq 4$. While better bounds have been obtained for $k=4$, their original bound has remained the best known for each $k \ge 5$. In this work, we obtain the first improvement to the Fredman-Koml\'{o}s bound for every $k \ge 5$. While we get explicit (numerical) bounds for $k=5,6$, for larger $k$ we only show that the FK bound can be improved by a positive, but unspecified, amount. Under a conjecture on the optimum value of a certain polynomial optimization problem over the simplex, our methods allow an effective bound to be computed for every $k$.
• ### Approximating Operator Norms via Generalized Krivine Rounding(1804.03644)

April 10, 2018 math.FA, cs.DS
We consider the $(\ell_p,\ell_r)$-Grothendieck problem, which seeks to maximize the bilinear form $y^T A x$ for an input matrix $A$ over vectors $x,y$ with $\|x\|_p=\|y\|_r=1$. The problem is equivalent to computing the $p \to r^*$ operator norm of $A$. The case $p=r=\infty$ corresponds to the classical Grothendieck problem. Our main result is an algorithm for arbitrary $p,r \ge 2$ with approximation ratio $(1+\epsilon_0)/(\sinh^{-1}(1)\cdot \gamma_{p^*} \,\gamma_{r^*})$ for some fixed $\epsilon_0 \le 0.00863$. Comparing this with Krivine's approximation ratio of $(\pi/2)/\sinh^{-1}(1)$ for the original Grothendieck problem, our guarantee is off from the best known hardness factor of $(\gamma_{p^*} \gamma_{r^*})^{-1}$ for the problem by a factor similar to Krivine's defect. Our approximation follows by bounding the value of the natural vector relaxation for the problem which is convex when $p,r \ge 2$. We give a generalization of random hyperplane rounding and relate the performance of this rounding to certain hypergeometric functions, which prescribe necessary transformations to the vector solution before the rounding is applied. Unlike Krivine's Rounding where the relevant hypergeometric function was $\arcsin$, we have to study a family of hypergeometric functions. The bulk of our technical work then involves methods from complex analysis to gain detailed information about the Taylor series coefficients of the inverses of these hypergeometric functions, which then dictate our approximation factor. Our result also implies improved bounds for "factorization through $\ell_{2}^{\,n}$" of operators from $\ell_{p}^{\,n}$ to $\ell_{q}^{\,m}$ (when $p\geq 2 \geq q$)--- such bounds are of significant interest in functional analysis and our work provides modest supplementary evidence for an intriguing parallel between factorizability, and constant-factor approximability.
• ### Inapproximability of Matrix $p\rightarrow q$ Norms(1802.07425)

Feb. 21, 2018 cs.CC
We study the problem of computing the $p\rightarrow q$ norm of a matrix $A \in R^{m \times n}$, defined as $\|A\|_{p\rightarrow q} ~:=~ \max_{x \,\in\, R^n \setminus \{0\}} \frac{\|Ax\|_q}{\|x\|_p}$ This problem generalizes the spectral norm of a matrix ($p=q=2$) and the Grothendieck problem ($p=\infty$, $q=1$), and has been widely studied in various regimes. When $p \geq q$, the problem exhibits a dichotomy: constant factor approximation algorithms are known if $2 \in [q,p]$, and the problem is hard to approximate within almost polynomial factors when $2 \notin [q,p]$. The regime when $p < q$, known as \emph{hypercontractive norms}, is particularly significant for various applications but much less well understood. The case with $p = 2$ and $q > 2$ was studied by [Barak et al, STOC'12] who gave sub-exponential algorithms for a promise version of the problem (which captures small-set expansion) and also proved hardness of approximation results based on the Exponential Time Hypothesis. However, no NP-hardness of approximation is known for these problems for any $p < q$. We study the hardness of approximating matrix norms in both the above cases and prove the following results: - We show that for any $1< p < q < \infty$ with $2 \notin [p,q]$, $\|A\|_{p\rightarrow q}$ is hard to approximate within $2^{O(\log^{1-\epsilon}\!n)}$ assuming $NP \not\subseteq BPTIME(2^{\log^{O(1)}\!n})$. This suggests that, similar to the case of $p \geq q$, the hypercontractive setting may be qualitatively different when $2$ does not lie between $p$ and $q$. - For all $p \geq q$ with $2 \in [q,p]$, we show $\|A\|_{p\rightarrow q}$ is hard to approximate within any factor than $1/(\gamma_{p^*} \cdot \gamma_q)$, where for any $r$, $\gamma_r$ denotes the $r^{th}$ norm of a gaussian, and $p^*$ is the dual norm of $p$.
• ### General Strong Polarization(1802.02718)

Feb. 8, 2018 cs.IT, math.IT
Ar\i kan's exciting discovery of polar codes has provided an altogether new way to efficiently achieve Shannon capacity. Given a (constant-sized) invertible matrix $M$, a family of polar codes can be associated with this matrix and its ability to approach capacity follows from the $\textit{polarization}$ of an associated $[0,1]$-bounded martingale, namely its convergence in the limit to either $0$ or $1$ with probability $1$. Ar\i kan showed appropriate polarization of the martingale associated with the matrix $G_2 = \left( \begin{smallmatrix} 1 & 0 1 & 1 \end{smallmatrix} \right)$ to get capacity achieving codes. His analysis was later extended to all matrices $M$ which satisfy an obvious necessary condition for polarization. While Ar\i kan's theorem does not guarantee that the codes achieve capacity at small blocklengths, it turns out that a "strong" analysis of the polarization of the underlying martingale would lead to such constructions. Indeed for the martingale associated with $G_2$ such a strong polarization was shown in two independent works ([Guruswami and Xia, IEEE IT '15] and [Hassani et al., IEEE IT '14]), thereby resolving a major theoretical challenge associated with the efficient attainment of Shannon capacity. In this work we extend the result above to cover martingales associated with all matrices that satisfy the necessary condition for (weak) polarization. In addition to being vastly more general, our proofs of strong polarization are (in our view) also much simpler and modular. Key to our proof is a notion of $\textit{local polarization}$ that only depends on the evolution of the martingale in a single time step. Our result shows strong polarization over all prime fields and leads to efficient capacity-achieving source codes for compressing arbitrary i.i.d. sources, and capacity-achieving channel codes for arbitrary symmetric memoryless channels.
• ### MDS Code Constructions with Small Sub-packetization and Near-optimal Repair Bandwidth(1709.08216)

Sept. 24, 2017 cs.IT, math.IT, cs.CC
This paper addresses the problem of constructing MDS codes that enable exact repair of each code block with small repair bandwidth, which refers to the total amount of information flow from the remaining code blocks during the repair process. This problem naturally arises in the context of distributed storage systems as the node repair problem [7]. The constructions of exact-repairable MDS codes with optimal repair-bandwidth require working with large sub-packetization levels, which restricts their employment in practice. This paper presents constructions for MDS codes that simultaneously provide both small repair bandwidth and small sub-packetization level. In particular, this paper presents two general approaches to construct exact-repairable MDS codes that aim at significantly reducing the required sub-packetization level at the cost of slightly sub-optimal repair bandwidth. The first approach gives MDS codes that have repair bandwidth at most twice the optimal repair-bandwidth. Additionally, these codes also have the smallest possible sub-packetization level $\ell = O(r)$, where $r$ denotes the number of parity blocks. This approach is then generalized to design codes that have their repair bandwidth approaching the optimal repair-bandwidth at the cost of graceful increment in the required sub-packetization level. The second approach transforms an MDS code with optimal repair-bandwidth and large sub-packetization level into a longer MDS code with small sub-packetization level and near-optimal repair bandwidth. For a given $r$, the obtained codes have their sub-packetization level scaling logarithmically with the code length. In addition, the obtained codes require field size only linear in the code length and ensure load balancing among the intact code blocks in terms of the information downloaded from these blocks during the exact reconstruction of a code block.
• ### Optimal rate list decoding over bounded alphabets using algebraic-geometric codes(1708.01070)

Aug. 3, 2017 cs.IT, math.IT, math.NT, cs.CC
We give new constructions of two classes of algebraic code families which are efficiently list decodable with small output list size from a fraction $1-R-\epsilon$ of adversarial errors where $R$ is the rate of the code, for any desired positive constant $\epsilon$. The alphabet size depends only $\epsilon$ and is nearly-optimal. The first class of codes are obtained by folding algebraic-geometric codes using automorphisms of the underlying function field. The list decoding algorithm is based on a linear-algebraic approach, which pins down the candidate messages to a subspace with a nice "periodic" structure. The list is pruned by precoding into a special form of "subspace-evasive" sets, which are constructed pseudorandomly. Instantiating this construction with the Garcia-Stichtenoth function field tower yields codes list-decodable up to a $1-R-\epsilon$ error fraction with list size bounded by $O(1/\epsilon)$, matching the existential bound up to constant factors. The parameters we achieve are thus quite close to the existential bounds in all three aspects: error-correction radius, alphabet size, and list-size. The second class of codes are obtained by restricting evaluation points of an algebraic-geometric code to rational points from a subfield. Once again, the linear-algebraic approach to list decoding to pin down candidate messages to a periodic subspace. We develop an alternate approach based on "subspace designs" to precode messages. Together with the subsequent explicit constructions of subspace designs, this yields a deterministic construction of an algebraic code family of rate $R$ with efficient list decoding from $1-R-\epsilon$ fraction of errors over a constant-sized alphabet. The list size is bounded by a very slowly growing function of the block length $N$; in particular, it is at most $O(\log^{(r)} N)$ (the $r$'th iterated logarithm) for any fixed integer $r$.
• ### Communication with Imperfectly Shared Randomness(1411.3603)

July 27, 2017 cs.IT, math.IT, math.PR, cs.CC
The communication complexity of many fundamental problems reduces greatly when the communicating parties share randomness that is independent of the inputs to the communication task. Natural communication processes (say between humans) however often involve large amounts of shared correlations among the communicating players, but rarely allow for perfect sharing of randomness. Can the communication complexity benefit from shared correlations as well as it does from shared randomness? This question was considered mainly in the context of simultaneous communication by Bavarian et al. (ICALP 2014). In this work we study this problem in the standard interactive setting and give some general results. In particular, we show that every problem with communication complexity of $k$ bits with perfectly shared randomness has a protocol using imperfectly shared randomness with complexity $\exp(k)$ bits. We also show that this is best possible by exhibiting a promise problem with complexity $k$ bits with perfectly shared randomness which requires $\exp(k)$ bits when the randomness is imperfectly shared. Along the way we also highlight some other basic problems such as compression, and agreement distillation, where shared randomness plays a central role and analyze the complexity of these problems in the imperfectly shared randomness model. The technical highlight of this work is the lower bound that goes into the result showing the tightness of our general connection. This result builds on the intuition that communication with imperfectly shared randomness needs to be less sensitive to its random inputs than communication with perfectly shared randomness. The formal proof invokes results about the small-set expansion of the noisy hypercube and an invariance principle to convert this intuition to a proof, thus giving a new application domain for these fundamental results.
• ### Polynomial time decodable codes for the binary deletion channel(1705.01963)

June 11, 2019 cs.IT, math.IT, cs.DS
In the random deletion channel, each bit is deleted independently with probability $p$. For the random deletion channel, the existence of codes of rate $(1-p)/9$, and thus bounded away from $0$ for any $p < 1$, has been known. We give an explicit construction with polynomial time encoding and deletion correction algorithms with rate $c_0 (1-p)$ for an absolute constant $c_0 > 0$.
• ### Coding against deletions in oblivious and online models(1612.06335)

July 26, 2017 cs.IT, math.IT, cs.DS, cs.CC
We consider binary error correcting codes when errors are deletions. A basic challenge concerning deletion codes is determining $p_0^{(adv)}$, the zero-rate threshold of adversarial deletions, defined to be the supremum of all $p$ for which there exists a code family with rate bounded away from 0 capable of correcting a fraction $p$ of adversarial deletions. A recent construction of deletion-correcting codes [Bukh et al 17] shows that $p_0^{(adv)} \ge \sqrt{2}-1$, and the trivial upper bound, $p_0^{(adv)}\le\frac{1}{2}$, is the best known. Perhaps surprisingly, we do not know whether or not $p_0^{(adv)} = 1/2$. In this work, to gain further insight into deletion codes, we explore two related error models: oblivious deletions and online deletions, which are in between random and adversarial deletions in power. In the oblivious model, the channel can inflict an arbitrary pattern of $pn$ deletions, picked without knowledge of the codeword. We prove the existence of binary codes of positive rate that can correct any fraction $p < 1$ of oblivious deletions, establishing that the associated zero-rate threshold $p_0^{(obliv)}$ equals $1$. For online deletions, where the channel decides whether to delete bit $x_i$ based only on knowledge of bits $x_1x_2\dots x_i$, define the deterministic zero-rate threshold for online deletions $p_0^{(on,d)}$ to be the supremum of $p$ for which there exist deterministic codes against an online channel causing $pn$ deletions with low average probability of error. That is, the probability that a randomly chosen codeword is decoded incorrectly is small. We prove $p_0^{(adv)}=\frac{1}{2}$ if and only if $p_0^{(on,d)}=\frac{1}{2}$.
• ### Sum-of-Squares Certificates for Maxima of Random Tensors on the Sphere(1605.00903)

June 17, 2017 cs.CC
For an $n$-variate order-$d$ tensor $A$, define $A_{\max} := \sup_{\| x \|_2 = 1} \langle A , x^{\otimes d} \rangle$ to be the maximum value taken by the tensor on the unit sphere. It is known that for a random tensor with i.i.d $\pm 1$ entries, $A_{\max} \lesssim \sqrt{n\cdot d\cdot\log d}$ w.h.p. We study the problem of efficiently certifying upper bounds on $A_{\max}$ via the natural relaxation from the Sum of Squares (SoS) hierarchy. Our results include: - When $A$ is a random order-$q$ tensor, we prove that $q$ levels of SoS certifies an upper bound $B$ on $A_{\max}$ that satisfies $B ~~~~\leq~~ A_{\max} \cdot \biggl(\frac{n}{q^{\,1-o(1)}}\biggr)^{q/4-1/2} \quad \text{w.h.p.}$ Our upper bound improves a result of Montanari and Richard (NIPS 2014) when $q$ is large. - We show the above bound is the best possible up to lower order terms, namely the optimum of the level-$q$ SoS relaxation is at least $A_{\max} \cdot \biggl(\frac{n}{q^{\,1+o(1)}}\biggr)^{q/4-1/2} \ .$ - When $A$ is a random order-$d$ tensor, we prove that $q$ levels of SoS certifies an upper bound $B$ on $A_{\max}$ that satisfies $B ~~\leq ~~ A_{\max} \cdot \biggl(\frac{\widetilde{O}(n)}{q}\biggr)^{d/4 - 1/2} \quad \text{w.h.p.}$ For growing $q$, this improves upon the bound certified by constant levels of SoS. This answers in part, a question posed by Hopkins, Shi, and Steurer (COLT 2015), who established the tight characterization for constant levels of SoS.
• ### Locality via Partially Lifted Codes(1704.08627)

April 27, 2017 cs.IT, math.IT
In error-correcting codes, locality refers to several different ways of quantifying how easily a small amount of information can be recovered from encoded data. In this work, we study a notion of locality called the s-Disjoint-Repair-Group Property (s-DRGP). This notion can interpolate between two very different settings in coding theory: that of Locally Correctable Codes (LCCs) when s is large---a very strong guarantee---and Locally Recoverable Codes (LRCs) when s is small---a relatively weaker guarantee. This motivates the study of the s-DRGP for intermediate s, which is the focus of our paper. We construct codes in this parameter regime which have a higher rate than previously known codes. Our construction is based on a novel variant of the lifted codes of Guo, Kopparty and Sudan. Beyond the results on the s-DRGP, we hope that our construction is of independent interest, and will find uses elsewhere.
• ### Weak Decoupling, Polynomial Folds, and Approximate Optimization over the Sphere(1611.05998)

April 22, 2017 cs.DS
We consider the following basic problem: given an $n$-variate degree-$d$ homogeneous polynomial $f$ with real coefficients, compute a unit vector $x \in \mathbb{R}^n$ that maximizes $|f(x)|$. Besides its fundamental nature, this problem arises in diverse contexts ranging from tensor and operator norms to graph expansion to quantum information theory. The homogeneous degree $2$ case is efficiently solvable as it corresponds to computing the spectral norm of an associated matrix, but the higher degree case is NP-hard. We give approximation algorithms for this problem that offer a trade-off between the approximation ratio and running time: in $n^{O(q)}$ time, we get an approximation within factor $O_d((n/q)^{d/2-1})$ for arbitrary polynomials, $O_d((n/q)^{d/4-1/2})$ for polynomials with non-negative coefficients, and $O_d(\sqrt{m/q})$ for sparse polynomials with $m$ monomials. The approximation guarantees are with respect to the optimum of the level-$q$ sum-of-squares (SoS) SDP relaxation of the problem. Known polynomial time algorithms for this problem rely on "decoupling lemmas." Such tools are not capable of offering a trade-off like our results as they blow up the number of variables by a factor equal to the degree. We develop new decoupling tools that are more efficient in the number of variables at the expense of less structure in the output polynomials. This enables us to harness the benefits of higher level SoS relaxations. We complement our algorithmic results with some polynomially large integrality gaps, albeit for a slightly weaker (but still very natural) relaxation. Toward this, we give a method to lift a level-$4$ solution matrix $M$ to a higher level solution, under a mild technical condition on $M$.
• ### Subspace Designs based on Algebraic Function Fields(1704.05992)

April 20, 2017 math.CO, cs.CC
Subspace designs are a (large) collection of high-dimensional subspaces $\{H_i\}$ of $\F_q^m$ such that for any low-dimensional subspace $W$, only a small number of subspaces from the collection have non-trivial intersection with $W$; more precisely, the sum of dimensions of $W \cap H_i$ is at most some parameter $L$. The notion was put forth by Guruswami and Xing (STOC'13) with applications to list decoding variants of Reed-Solomon and algebraic-geometric codes, and later also used for explicit rank-metric codes with optimal list decoding radius. Guruswami and Kopparty (FOCS'13, Combinatorica'16) gave an explicit construction of subspace designs with near-optimal parameters. This construction was based on polynomials and has close connections to folded Reed-Solomon codes, and required large field size (specifically $q \ge m$). Forbes and Guruswami (RANDOM'15) used this construction to give explicit constant degree "dimension expanders" over large fields, and noted that subspace designs are a powerful tool in linear-algebraic pseudorandomness. Here, we construct subspace designs over any field, at the expense of a modest worsening of the bound $L$ on total intersection dimension. Our approach is based on a (non-trivial) extension of the polynomial-based construction to algebraic function fields, and instantiating the approach with cyclotomic function fields. Plugging in our new subspace designs in the construction of Forbes and Guruswami yields dimension expanders over $\F^n$ for any field $\F$, with logarithmic degree and expansion guarantee for subspaces of dimension $\Omega(n/(\log \log n))$.
• ### Promise Constraint Satisfaction: Algebraic Structure and a Symmetric Boolean Dichotomy(1704.01937)

April 6, 2017 cs.LO, cs.DM, cs.CC
A classic result due to Schaefer (1978) classifies all constraint satisfaction problems (CSPs) over the Boolean domain as being either in $\mathsf{P}$ or $\mathsf{NP}$-hard. This paper considers a promise-problem variant of CSPs called PCSPs. A PCSP over a finite set of pairs of constraints $\Gamma$ consists of a pair $(\Psi_P, \Psi_Q)$ of CSPs with the same set of variables such that for every $(P, Q) \in \Gamma$, $P(x_{i_1}, ..., x_{i_k})$ is a clause of $\Psi_P$ if and only if $Q(x_{i_1}, ..., x_{i_k})$ is a clause of $\Psi_Q$. The promise problem $\operatorname{PCSP}(\Gamma)$ is to distinguish, given $(\Psi_P, \Psi_Q)$, between the cases $\Psi_P$ is satisfiable and $\Psi_Q$ is unsatisfiable. Many natural problems including approximate graph and hypergraph coloring can be placed in this framework. This paper is motivated by the pursuit of understanding the computational complexity of Boolean promise CSPs. As our main result, we show that $\operatorname{PCSP}(\Gamma)$ exhibits a dichotomy (it is either polynomial time solvable or $\mathsf{NP}$-hard) when the relations in $\Gamma$ are symmetric and allow for negations of variables. We achieve our dichotomy theorem by extending the weak polymorphism framework of Austrin, Guruswami, and H\aa stad [FOCS '14] which itself is a generalization of the algebraic approach to study CSPs. In both the algorithm and hardness portions of our proof, we incorporate new ideas and techniques not utilized in the CSP case. Furthermore, we show that the computational complexity of any promise CSP (over arbitrary finite domains) is captured entirely by its weak polymorphisms, a feature known as Galois correspondence, as well as give necessary and sufficient conditions for the structure of this set of weak polymorphisms. Such insights call us to question the existence of a general dichotomy for Boolean PCSPs.
• ### Efficiently list-decodable punctured Reed-Muller codes(1508.00603)

April 2, 2017 cs.IT, math.IT, cs.CC
The Reed-Muller (RM) code encoding $n$-variate degree-$d$ polynomials over ${\mathbb F}_q$ for $d < q$, with its evaluation on ${\mathbb F}_q^n$, has relative distance $1-d/q$ and can be list decoded from a $1-O(\sqrt{d/q})$ fraction of errors. In this work, for $d \ll q$, we give a length-efficient puncturing of such codes which (almost) retains the distance and list decodability properties of the Reed-Muller code, but has much better rate. Specificially, when $q =\Omega( d^2/\epsilon^2)$, we given an explicit rate $\Omega\left(\frac{\epsilon}{d!}\right)$ puncturing of Reed-Muller codes which have relative distance at least $(1-\epsilon)$ and efficient list decoding up to $(1-\sqrt{\epsilon})$ error fraction. This almost matches the performance of random puncturings which work with the weaker field size requirement $q= \Omega( d/\epsilon^2)$. We can also improve the field size requirement to the optimal (up to constant factors) $q =\Omega( d/\epsilon)$, at the expense of a worse list decoding radius of $1-\epsilon^{1/3}$ and rate $\Omega\left(\frac{\epsilon^2}{d!}\right)$. The first of the above trade-offs is obtained by substituting for the variables functions with carefully chosen pole orders from an algebraic function field; this leads to a puncturing for which the RM code is a subcode of a certain algebraic-geometric code (which is known to be efficiently list decodable). The second trade-off is obtained by concatenating this construction with a Reed-Solomon based multiplication friendly pair, and using the list recovery property of algebraic-geometric codes.
• ### Repairing Reed-Solomon Codes(1509.04764)

Aug. 9, 2016 cs.IT, math.IT, cs.CC
We study the performance of Reed-Solomon (RS) codes for the \em exact repair problem \em in distributed storage. Our main result is that, in some parameter regimes, Reed-Solomon codes are optimal regenerating codes, among MDS codes with linear repair schemes. Moreover, we give a characterization of MDS codes with linear repair schemes which holds in any parameter regime, and which can be used to give non-trivial repair schemes for RS codes in other settings. More precisely, we show that for $k$-dimensional RS codes whose evaluation points are a finite field of size $n$, there are exact repair schemes with bandwidth $(n-1)\log((n-1)/(n-k))$ bits, and that this is optimal for any MDS code with a linear repair scheme. In contrast, the naive (commonly implemented) repair algorithm for this RS code has bandwidth $k\log(n)$ bits. When the entire field is used as evaluation points, the number of nodes $n$ is much larger than the number of bits per node (which is $O(\log(n))$), and so this result holds only when the degree of sub-packetization is small. However, our method applies in any parameter regime, and to illustrate this for high levels of sub-packetization we give an improved repair scheme for a specific (14,10)-RS code used in the Facebook Hadoop Analytics cluster.
• ### New MDS codes with small sub-packetization and near-optimal repair bandwidth(1608.00191)

July 31, 2016 cs.IT, math.IT, cs.DS
An $(n, M)$ vector code $\mathcal{C} \subseteq \mathbb{F}^n$ is a collection of $M$ codewords where $n$ elements (from the field $\mathbb{F}$) in each of the codewords are referred to as code blocks. Assuming that $\mathbb{F} \cong \mathbb{B}^{\ell}$, the code blocks are treated as $\ell$-length vectors over the base field $\mathbb{B}$. Equivalently, the code is said to have the sub-packetization level $\ell$. This paper addresses the problem of constructing MDS vector codes which enable exact reconstruction of each code block by downloading small amount of information from the remaining code blocks. The repair bandwidth of a code measures the information flow from the remaining code blocks during the reconstruction of a single code block. This problem naturally arises in the context of distributed storage systems as the node repair problem [4]. Assuming that $M = |\mathbb{B}|^{k\ell}$, the repair bandwidth of an MDS vector code is lower bounded by $\big(\frac{n - 1}{n - k}\big)\cdot \ell$ symbols (over the base field $\mathbb{B}$) which is also referred to as the cut-set bound [4]. For all values of $n$ and $k$, the MDS vector codes that attain the cut-set bound with the sub-packetization level $\ell = (n-k)^{\lceil{{n}/{(n-k)}}\rceil}$ are known in the literature [23, 35]. This paper presents a construction for MDS vector codes which simultaneously ensures both small repair bandwidth and small sub-packetization level. The obtained codes have the smallest possible sub-packetization level $\ell = O(n - k)$ for an MDS vector code and the repair bandwidth which is at most twice the cut-set bound. The paper then generalizes this code construction so that the repair bandwidth of the obtained codes approach the cut-set bound at the cost of increased sub-packetization level. The constructions presented in this paper give MDS vector codes which are linear over the base field $\mathbb{B}$.
• ### Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes(1605.04611)

May 15, 2016 cs.IT, math.IT, cs.DS
This work constructs codes that are efficiently decodable from a constant fraction of \emph{worst-case} insertion and deletion errors in three parameter settings: (i) Binary codes with rate approaching 1; (ii) Codes with constant rate for error fraction approaching 1 over fixed alphabet size; and (iii) Constant rate codes over an alphabet of size $k$ for error fraction approaching $(k-1)/(k+1)$. When errors are constrained to deletions alone, efficiently decodable codes in each of these regimes were constructed recently. We complete the picture by constructing similar codes that are efficiently decodable in the insertion/deletion regime.
• ### Rapidly Mixing Markov Chains: A Comparison of Techniques (A Survey)(1603.01512)

March 4, 2016 cs.DS
We survey existing techniques to bound the mixing time of Markov chains. The mixing time is related to a geometric parameter called conductance which is a measure of edge-expansion. Bounds on conductance are typically obtained by a technique called "canonical paths" where the idea is to find a set of paths, one between every source-destination pair, such that no edge is heavily congested. However, the canonical paths approach cannot always show rapid mixing of a rapidly mixing chain. This drawback disappears if we allow the flow between a pair of states to be spread along multiple paths. We prove that for a large class of Markov chains canonical paths does capture rapid mixing. Allowing multiple paths to route the flow still does help a great deal in proofs, as illustrated by a result of Morris & Sinclair (FOCS'99) on the rapid mixing of a Markov chain for sampling 0/1 knapsack solutions. A different approach to prove rapid mixing is "Coupling". Path Coupling is a variant discovered by Bubley & Dyer (FOCS'97) that often tremendously reduces the complexity of designing good Couplings. We present several applications of Path Coupling in proofs of rapid mixing. These invariably lead to much better bounds on mixing time than known using conductance, and moreover Coupling based proofs are typically simpler. This motivates the question of whether Coupling can be made to work whenever the chain is rapidly mixing. This question was answered in the negative by Kumar & Ramesh (FOCS'99), who showed that no Coupling strategy can prove the rapid mixing of the Jerrum-Sinclair chain for sampling perfect and near-perfect matchings.
• ### An improved bound on the fraction of correctable deletions(1507.01719)

March 1, 2016 math.CO, cs.IT, math.IT, cs.DM
We consider codes over fixed alphabets against worst-case symbol deletions. For any fixed $k \ge 2$, we construct a family of codes over alphabet of size $k$ with positive rate, which allow efficient recovery from a worst-case deletion fraction approaching $1-\frac{2}{k+\sqrt k}$. In particular, for binary codes, we are able to recover a fraction of deletions approaching $1/(\sqrt 2 +1)=\sqrt 2-1 \approx 0.414$. Previously, even non-constructively the largest deletion fraction known to be correctable with positive rate was $1-\Theta(1/\sqrt{k})$, and around $0.17$ for the binary case. Our result pins down the largest fraction of correctable deletions for $k$-ary codes as $1-\Theta(1/k)$, since $1-1/k$ is an upper bound even for the simpler model of erasures where the locations of the missing symbols are known. Closing the gap between $(\sqrt 2 -1)$ and $1/2$ for the limit of worst-case deletions correctable by binary codes remains a tantalizing open question.
• ### Superlinear lower bounds for multipass graph processing(1212.6925)

Feb. 8, 2016 cs.DS, cs.CC
We prove $n^{1+\Omega(1/p)}/p^{O(1)}$ lower bounds for the space complexity of $p$-pass streaming algorithms solving the following problems on $n$-vertex graphs: * testing if an undirected graph has a perfect matching (this implies lower bounds for computing a maximum matching or even just the maximum matching size), * testing if two specific vertices are at distance at most $2(p+1)$ in an undirected graph, * testing if there is a directed path from $s$ to $t$ for two specific vertices $s$ and $t$ in a directed graph. Prior to our result, it was known that these problems require $\Omega(n^2)$ space in one pass, but no $n^{1+\Omega(1)}$ lower bound was known for any $p\ge 2$. These streaming results follow from a communication complexity lower bound for a communication game in which the players hold two graphs on the same set of vertices. The task of the players is to find out whether the sets of vertices at distance exactly $p+1$ from a specific vertex intersect. The game requires a significant amount of communication only if the players are forced to speak in a specific difficult order. This is reminiscent of lower bounds for communication problems such as indexing and pointer chasing. Among other things, our line of attack requires proving an information cost lower bound for a decision version of the classic pointer chasing problem and a direct sum type theorem for the disjunction of several instances of this problem.
• ### Efficient Low-Redundancy Codes for Correcting Multiple Deletions(1507.06175)

May 20, 2019 cs.IT, math.IT, cs.DM, cs.DS
We consider the problem of constructing binary codes to recover from $k$-bit deletions with efficient encoding/decoding, for a fixed $k$. The single deletion case is well understood, with the Varshamov-Tenengolts-Levenshtein code from 1965 giving an asymptotically optimal construction with $\approx 2^n/n$ codewords of length $n$, i.e., at most $\log n$ bits of redundancy. However, even for the case of two deletions, there was no known explicit construction with redundancy less than $n^{\Omega(1)}$. For any fixed $k$, we construct a binary code with $c_k \log n$ redundancy that can be decoded from $k$ deletions in $O_k(n \log^4 n)$ time. The coefficient $c_k$ can be taken to be $O(k^2 \log k)$, which is only quadratically worse than the optimal, non-constructive bound of $O(k)$. We also indicate how to modify this code to allow for a combination of up to $k$ insertions and deletions.
• ### Approximate Hypergraph Coloring under Low-discrepancy and Related Promises(1506.06444)

June 22, 2015 cs.DS, cs.CC
A hypergraph is said to be $\chi$-colorable if its vertices can be colored with $\chi$ colors so that no hyperedge is monochromatic. $2$-colorability is a fundamental property (called Property B) of hypergraphs and is extensively studied in combinatorics. Algorithmically, however, given a $2$-colorable $k$-uniform hypergraph, it is NP-hard to find a $2$-coloring miscoloring fewer than a fraction $2^{-k+1}$ of hyperedges (which is achieved by a random $2$-coloring), and the best algorithms to color the hypergraph properly require $\approx n^{1-1/k}$ colors, approaching the trivial bound of $n$ as $k$ increases. In this work, we study the complexity of approximate hypergraph coloring, for both the maximization (finding a $2$-coloring with fewest miscolored edges) and minimization (finding a proper coloring using fewest number of colors) versions, when the input hypergraph is promised to have the following stronger properties than $2$-colorability: (A) Low-discrepancy: If the hypergraph has discrepancy $\ell \ll \sqrt{k}$, we give an algorithm to color the it with $\approx n^{O(\ell^2/k)}$ colors. However, for the maximization version, we prove NP-hardness of finding a $2$-coloring miscoloring a smaller than $2^{-O(k)}$ (resp. $k^{-O(k)}$) fraction of the hyperedges when $\ell = O(\log k)$ (resp. $\ell=2$). Assuming the UGC, we improve the latter hardness factor to $2^{-O(k)}$ for almost discrepancy-$1$ hypergraphs. (B) Rainbow colorability: If the hypergraph has a $(k-\ell)$-coloring such that each hyperedge is polychromatic with all these colors, we give a $2$-coloring algorithm that miscolors at most $k^{-\Omega(k)}$ of the hyperedges when $\ell \ll \sqrt{k}$, and complement this with a matching UG hardness result showing that when $\ell =\sqrt{k}$, it is hard to even beat the $2^{-k+1}$ bound achieved by a random coloring.
• ### Inapproximability of $H$-Transversal/Packing(1506.06302)

June 20, 2015 cs.DS, cs.CC
Given an undirected graph $G = (V_G, E_G)$ and a fixed "pattern" graph $H = (V_H, E_H)$ with $k$ vertices, we consider the $H$-Transversal and $H$-Packing problems. The former asks to find the smallest $S \subseteq V_G$ such that the subgraph induced by $V_G \setminus S$ does not have $H$ as a subgraph, and the latter asks to find the maximum number of pairwise disjoint $k$-subsets $S_1, ..., S_m \subseteq V_G$ such that the subgraph induced by each $S_i$ has $H$ as a subgraph. We prove that if $H$ is 2-connected, $H$-Transversal and $H$-Packing are almost as hard to approximate as general $k$-Hypergraph Vertex Cover and $k$-Set Packing, so it is NP-hard to approximate them within a factor of $\Omega (k)$ and $\widetilde \Omega (k)$ respectively. We also show that there is a 1-connected $H$ where $H$-Transversal admits an $O(\log k)$-approximation algorithm, so that the connectivity requirement cannot be relaxed from 2 to 1. For a special case of $H$-Transversal where $H$ is a (family of) cycles, we mention the implication of our result to the related Feedback Vertex Set problem, and give a different hardness proof for directed graphs.
• ### Dimension Expanders via Rank Condensers(1411.7455)

Nov. 27, 2014 cs.CC
An emerging theory of "linear-algebraic pseudorandomness" aims to understand the linear-algebraic analogs of fundamental Boolean pseudorandom objects where the rank of subspaces plays the role of the size of subsets. In this work, we study and highlight the interrelationships between several such algebraic objects such as subspace designs, dimension expanders, seeded rank condensers, two-source rank condensers, and rank-metric codes. In particular, with the recent construction of near-optimal subspace designs by Guruswami and Kopparty as a starting point, we construct good (seeded) rank condensers (both lossless and lossy versions), which are a small collection of linear maps $\mathbb{F}^n \to \mathbb{F}^t$ for $t \ll n$ such that for every subset of $\mathbb{F}^n$ of small rank, its rank is preserved (up to a constant factor in the lossy case) by at least one of the maps. We then compose a tensoring operation with our lossy rank condenser to construct constant-degree dimension expanders over polynomially large fields. That is, we give $O(1)$ explicit linear maps $A_i:\mathbb{F}^n\to \mathbb{F}^n$ such that for any subspace $V \subseteq \mathbb{F}^n$ of dimension at most $n/2$, $\dim\bigl( \sum_i A_i(V)\bigr) \ge (1+\Omega(1)) \dim(V)$. Previous constructions of such constant-degree dimension expanders were based on Kazhdan's property $T$ (for the case when $\mathbb{F}$ has characteristic zero) or monotone expanders (for every field $\mathbb{F}$); in either case the construction was harder than that of usual vertex expanders. Our construction, on the other hand, is simpler. Via an equivalence to linear rank-metric codes, we then construct optimal lossless two-source condensers. We then use our seeded rank condensers to obtain near-optimal lossy two-source condensers for constant rank sources.