• ### Erasure Correction for Noisy Radio Networks(1805.04165)

May 10, 2018 cs.DC, cs.DS
The radio network model is a well-studied abstraction for modeling wireless multi-hop networks. However, radio networks make the strong assumption that messages are delivered deterministically. The recently introduced noisy radio network model relaxes this assumption by dropping messages independently at random. In this work we quantify the relative computational power of noisy radio networks and classic radio networks. In particular, given a protocol for a fixed radio network we show how to reliably simulate this protocol if noise is introduced with a multiplicative cost of $\mathrm{poly}(\log \Delta, \log \log n)$ rounds. For this result we make the simplifying assumption that the simulated protocol is $\textit{static}$. Moreover, we demonstrate that, even if the simulated protocol is not static, it can be simulated with a multiplicative $O(\Delta \log \Delta)$ cost in the number of rounds. Hence, our results show that protocols on constant-degree networks can be made robust to random noise with constant multiplicative overhead. Lastly, we argue that simulations with a multiplicative overhead of $o(\log \Delta)$ are unlikely to exist by proving that an $\Omega(\log \Delta)$ multiplicative round overhead is necessary under certain natural assumptions.
• ### Optimal Document Exchange and New Codes for Small Number of Insertions and Deletions(1804.03604)

April 10, 2018 cs.DS
This paper gives a communication-optimal document exchange protocol and an efficient near optimal derandomization. This also implies drastically improved error correcting codes for small number of adversarial insertions and deletions. For any $n$ and $k < n$ our randomized hashing scheme takes any $n$-bit file $F$ and computes a $O(k \log \frac{n}{k})$-bit summary from which one can reconstruct $F$ given a related file $F'$ with edit distance $ED(F,F') \leq k$. The size of our summary is information-theoretically order optimal for all values of $k$, positively answering a question of Orlitsky. It also is the first non-trivial solution when a small constant fraction of symbols have been edited, producing an optimal summary of size $O(H(\delta)n)$ for $k=\delta n$. This concludes a long series of better-and-better protocols which produce larger summaries for sub-linear values of $k$. In particular, the recent break-through of [Belazzougi, Zhang; STOC'16] assumes that $k < n^\epsilon$ and produces a summary of size $O(k\log^2 k + k\log n)$. We also give an efficient derandomization with near optimal summary size $O(k \log^2 \frac{n}{k})$ improving, for every $k$, over a deterministic $O(k^2 + k \log^2 n)$ document exchange scheme by Belazzougi. This directly leads to near optimal systematic error correcting codes which efficiently can recover from any $k$ insertions and deletions while having $\Theta(k \log^2 \frac{n}{k})$ bits of redundancy. For the setting of $k=n\epsilon$ this $O(\epsilon \log^2 \frac{1}{\epsilon} \cdot n)$-bit redundancy is near optimal and a quadratic improvement over the binary codes of Guruswami and Li and Haeupler, Shahrasbi and Vitercik which have redundancy $\Theta\left(\sqrt{\epsilon} \log^{O(1)} \frac{1}{\epsilon} \cdot n\right)$.
• ### Synchronization Strings: Channel Simulations and Interactive Coding for Insertions and Deletions(1707.04233)

March 20, 2018 cs.IT, math.IT, cs.DS
We present many new results related to reliable (interactive) communication over insertion-deletion channels. Synchronization errors, such as insertions and deletions, strictly generalize the usual symbol corruption errors and are much harder to protect against. We show how to hide the complications of synchronization errors in many applications by introducing very general channel simulations which efficiently transform an insertion-deletion channel into a regular symbol corruption channel with an error rate larger by a constant factor and a slightly smaller alphabet. We generalize synchronization string based methods which were recently introduced as a tool to design essentially optimal error correcting codes for insertion-deletion channels. Our channel simulations depend on the fact that, at the cost of increasing the error rate by a constant factor, synchronization strings can be decoded in a streaming manner that preserves linearity of time. We also provide a lower bound showing that this constant factor cannot be improved to $1+\epsilon$, in contrast to what is achievable for error correcting codes. Our channel simulations drastically generalize the applicability of synchronization strings. We provide new interactive coding schemes which simulate any interactive two-party protocol over an insertion-deletion channel. Our results improve over the interactive coding schemes of Braverman et al. [TransInf 2017] and Sherstov and Wu [FOCS 2017], which achieve a small constant rate and require exponential time computations, with respect to computational and communication complexities. We provide the first computationally efficient interactive coding schemes for synchronization errors, the first coding scheme with a rate approaching one for small noise rates, and also the first coding scheme that works over arbitrarily small alphabet sizes.
• ### Synchronization Strings: Efficient and Fast Deterministic Constructions over Small Alphabets(1803.03530)

March 8, 2018 cs.IT, math.IT, cs.DS
Synchronization strings are recently introduced by Haeupler and Shahrasbi (STOC 2017) in the study of codes for correcting insertion and deletion errors (insdel codes). They showed that for any parameter $\varepsilon>0$, synchronization strings of arbitrary length exist over an alphabet whose size depends only on $\varepsilon$. Specifically, they obtained an alphabet size of $O(\varepsilon^{-4})$, which left an open question on where the minimal size of such alphabets lies between $\Omega(\varepsilon^{-1})$ and $O(\varepsilon^{-4})$. In this work, we partially bridge this gap by providing an improved lower bound of $\Omega(\varepsilon^{-3/2})$, and an improved upper bound of $O(\varepsilon^{-2})$. We also provide fast explicit constructions of synchronization strings over small alphabets. Further, along the lines of previous work on similar combinatorial objects, we study the extremal question of the smallest possible alphabet size over which synchronization strings can exist for some constant $\varepsilon < 1$. We show that one can construct $\varepsilon$-synchronization strings over alphabets of size four while no such string exists over binary alphabets. This reduces the extremal question to whether synchronization strings exist over ternary alphabets.
• ### Round- and Message-Optimal Distributed Graph Algorithms(1801.05127)

Feb. 27, 2018 cs.DC, cs.DS
Distributed graph algorithms that separately optimize for either the number of rounds used or the total number of messages sent have been studied extensively. However, algorithms simultaneously efficient with respect to both measures have been elusive. For example, only very recently was it shown that for Minimum Spanning Tree (MST), an optimal message and round complexity is achievable (up to polylog terms) by a single algorithm in the CONGEST model of communication. In this paper we provide algorithms that are simultaneously round- and message-optimal for a number of well-studied distributed optimization problems. Our main result is such a distributed algorithm for the fundamental primitive of computing simple functions over each part of a graph partition. From this algorithm we derive round- and message-optimal algorithms for multiple problems, including MST, Approximate Min-Cut and Approximate Single Source Shortest Paths, among others. On general graphs all of our algorithms achieve worst-case optimal $\tilde{O}(D+\sqrt n)$ round complexity and $\tilde{O}(m)$ message complexity. Furthermore, our algorithms require an optimal $\tilde{O}(D)$ rounds and $\tilde{O}(n)$ messages on planar, genus-bounded, treewidth-bounded and pathwidth-bounded graphs.
• ### Synchronization Strings: List Decoding for Insertions and Deletions(1802.08663)

Feb. 23, 2018 cs.IT, math.IT, cs.DS
We study codes that are list-decodable under insertions and deletions. Specifically, we consider the setting where a codeword over some finite alphabet of size $q$ may suffer from $\delta$ fraction of adversarial deletions and $\gamma$ fraction of adversarial insertions. A code is said to be $L$-list-decodable if there is an (efficient) algorithm that, given a received word, reports a list of $L$ codewords that include the original codeword. Using the concept of synchronization strings, introduced by the first two authors [STOC 2017], we show some surprising results. We show that for every $0\leq\delta<1$, every $0\leq\gamma<\infty$ and every $\epsilon>0$ there exist efficient codes of rate $1-\delta-\epsilon$ and constant alphabet (so $q=O_{\delta,\gamma,\epsilon}(1)$) and sub-logarithmic list sizes. We stress that the fraction of insertions can be arbitrarily large and the rate is independent of this parameter. Our result sheds light on the remarkable asymmetry between the impact of insertions and deletions from the point of view of error-correction: Whereas deletions cost in the rate of the code, insertion costs are borne by the adversary and not the code! We also prove several tight bounds on the parameters of list-decodable insdel codes. In particular, we show that the alphabet size of insdel codes needs to be exponentially large in $\epsilon^{-1}$, where $\epsilon$ is the gap to capacity above. Our result even applies to settings where the unique-decoding capacity equals the list-decoding capacity and when it does so, it shows that the alphabet size needs to be exponentially large in the gap to capacity. This is sharp contrast to the Hamming error model where alphabet size polynomial in $\epsilon^{-1}$ suffices for unique decoding and also shows that the exponential dependence on the alphabet size in previous works that constructed insdel codes is actually necessary!
• ### Faster Distributed Shortest Path Approximations via Shortcuts(1802.03671)

Feb. 11, 2018 cs.DS
A long series of recent results and breakthroughs have led to faster and better distributed approximation algorithms for single source shortest paths (SSSP) and related problems in the CONGEST model. The runtime of all these algorithms, however, is $\tilde{\Omega}(\sqrt{n})$, regardless of the network topology, even on nice networks with a (poly)logarithmic network diameter $D$. While this is known to be necessary for some pathological networks, most topologies of interest are arguably not of this type. We give the first distributed approximation algorithms for shortest paths problems that adjust to the topology they are run on, thus achieving significantly faster running times on many topologies of interest. The running time of our algorithms depends on and is close to $Q$, where $Q$ is the quality of the best shortcut that exists for the given topology. While $Q = \tilde{\Theta}(\sqrt{n} + D)$ for pathological worst-case topologies, many topologies of interest have $Q = \tilde{\Theta}(D)$, which results in near instance optimal running times for our algorithm, given the trivial $\Omega(D)$ lower bound. The problems we consider are as follows: (1) an approximate shortest path tree and SSSP distances, (2) a polylogarithmic size distance label for every node such that from the labels of any two nodes alone one can determine their distance (approximately), and (3) an (approximately) optimal flow for the transshipment problem. Our algorithms have a tunable tradeoff between running time and approximation ratio. Our fastest algorithms have an arbitrarily good polynomial approximation guarantee and an essentially optimal $\tilde{O}(Q)$ running time. On the other end of the spectrum, we achieve polylogarithmic approximations in $\tilde{O}(Q \cdot n^{\epsilon})$ rounds for any $\epsilon > 0$. It seems likely that eventually, our non-trivial approximation algorithms for the...
• ### Minor Excluded Network Families Admit Fast Distributed Algorithms(1801.06237)

Jan. 18, 2018 cs.DS
Distributed network optimization algorithms, such as minimum spanning tree, minimum cut, and shortest path, are an active research area in distributed computing. This paper presents a fast distributed algorithm for such problems in the CONGEST model, on networks that exclude a fixed minor. On general graphs, many optimization problems, including the ones mentioned above, require $\tilde\Omega(\sqrt n)$ rounds of communication in the CONGEST model, even if the network graph has a much smaller diameter. Naturally, the next step in algorithm design is to design efficient algorithms which bypass this lower bound on a restricted class of graphs. Currently, the only known method of doing so uses the low-congestion shortcut framework of Ghaffari and Haeupler [SODA'16]. Building off of their work, this paper proves that excluded minor graphs admit high-quality shortcuts, leading to an $\tilde O(D^2)$ round algorithm for the aforementioned problems, where $D$ is the diameter of the network graph. To work with excluded minor graph families, we utilize the Graph Structure Theorem of Robertson and Seymour. To the best of our knowledge, this is the first time the Graph Structure Theorem has been used for an algorithmic result in the distributed setting. Even though the proof is involved, merely showing the existence of good shortcuts is sufficient to obtain simple, efficient distributed algorithms. In particular, the shortcut framework can efficiently construct near-optimal shortcuts and then use them to solve the optimization problems. This, combined with the very general family of excluded minor graphs, which includes most other important graph classes, makes this result of significant interest.
• ### Parallel algorithms and concentration bounds for the Lovasz Local Lemma via witness DAGs(1509.06430)

Sept. 28, 2017 math.CO, cs.DM, cs.DS
The Lov\'{a}sz Local Lemma (LLL) is a cornerstone principle in the probabilistic method of combinatorics, and a seminal algorithm of Moser & Tardos (2010) provides an efficient randomized algorithm to implement it. This can be parallelized to give an algorithm that uses polynomially many processors and runs in $O(\log^3 n)$ time on an EREW PRAM, stemming from $O(\log n)$ adaptive computations of a maximal independent set (MIS). Chung et al. (2014) developed faster local and parallel algorithms, potentially running in time $O(\log^2 n)$, but these algorithms require more stringent conditions than the LLL. We give a new parallel algorithm that works under essentially the same conditions as the original algorithm of Moser & Tardos but uses only a single MIS computation, thus running in $O(\log^2 n)$ time on an EREW PRAM. This can be derandomized to give an NC algorithm running in time $O(\log^2 n)$ as well, speeding up a previous NC LLL algorithm of Chandrasekaran et al. (2013). We also provide improved and tighter bounds on the run-times of the sequential and parallel resampling-based algorithms originally developed by Moser & Tardos. These apply to any problem instance in which the tighter Shearer LLL criterion is satisfied.

May 21, 2017 cs.DC, cs.DS
The widely-studied radio network model [Chlamtac and Kutten, 1985] is a graph-based description that captures the inherent impact of collisions in wireless communication. In this model, the strong assumption is made that node $v$ receives a message from a neighbor if and only if exactly one of its neighbors broadcasts. We relax this assumption by introducing a new noisy radio network model in which random faults occur at senders or receivers. Specifically, for a constant noise parameter $p \in [0,1)$, either every sender has probability $p$ of transmitting noise or every receiver of a single transmission in its neighborhood has probability $p$ of receiving noise. We first study single-message broadcast algorithms in noisy radio networks and show that the Decay algorithm [Bar-Yehuda et al., 1992] remains robust in the noisy model while the diameter-linear algorithm of Gasieniec et al., 2007 does not. We give a modified version of the algorithm of Gasieniec et al., 2007 that is robust to sender and receiver faults, and extend both this modified algorithm and the Decay algorithm to robust multi-message broadcast algorithms. We next investigate the extent to which (network) coding improves throughput in noisy radio networks. We address the previously perplexing result of Alon et al. 2014 that worst case coding throughput is no better than worst case routing throughput up to constants: we show that the worst case throughput performance of coding is, in fact, superior to that of routing -- by a $\Theta(\log(n))$ gap -- provided receiver faults are introduced. However, we show that any coding or routing scheme for the noiseless setting can be transformed to be robust to sender faults with only a constant throughput overhead. These transformations imply that the results of Alon et al., 2014 carry over to noisy radio networks with sender faults.
• ### Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound(1704.00807)

April 3, 2017 cs.IT, math.IT, cs.DS
We introduce synchronization strings as a novel way of efficiently dealing with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to deal with than commonly considered half-errors, i.e., symbol corruptions and erasures. For every $\epsilon >0$, synchronization strings allow to index a sequence with an $\epsilon^{-O(1)}$ size alphabet such that one can efficiently transform $k$ synchronization errors into $(1+\epsilon)k$ half-errors. This powerful new technique has many applications. In this paper, we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion deletion channels. While ECCs for both half-errors and synchronization errors have been intensely studied, the later has largely resisted progress. Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed by Schulman and Zuckerman. Insdel codes for asymptotically large or small noise rates were given in 2016 by Guruswami et al. but these codes are still polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A direct application of our synchronization strings based indexing method gives a simple black-box construction which transforms any ECC into an equally efficient insdel code with a slightly larger alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs over large constant alphabets into the realm of insdel codes. Most notably, we obtain efficient insdel codes which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound for the complete noise spectrum.
• ### Making Asynchronous Distributed Computations Robust to Noise(1702.07403)

Feb. 23, 2017 cs.DC, cs.DS
We consider the problem of making distributed computations robust to noise, in particular to worst-case (adversarial) corruptions of messages. We give a general distributed interactive coding scheme which simulates any asynchronous distributed protocol while tolerating an optimal corruption of a $\Theta(1/n)$ fraction of all messages while incurring a moderate blowup of $O(n\log^2 n)$ in the communication complexity. Our result is the first fully distributed interactive coding scheme in which the topology of the communication network is not known in advance. Prior work required either a coordinating node to be connected to all other nodes in the network or assumed a synchronous network in which all nodes already know the complete topology of the network.
• ### Low-Congestion Shortcuts without Embedding(1607.07553)

Dec. 13, 2019 cs.DS
Distributed optimization algorithms are frequently faced with solving sub-problems on disjoint connected parts of a network. Unfortunately, the diameter of these parts can be significantly larger than the diameter of the underlying network, leading to slow running times. Recent work by [Ghaffari and Hauepler; SODA'16] showed that this phenomenon can be seen as the broad underlying reason for the pervasive $\Omega(\sqrt{n} + D)$ lower bounds that apply to most optimization problems in the CONGEST model. On the positive side, this work also introduced low-congestion shortcuts as an elegant solution to circumvent this problem in certain topologies of interest. Particularly, they showed that there exist good shortcuts for any planar network and more generally any bounded genus network. This directly leads to fast $O(D \log^{O(1)} n)$ distributed algorithms for MST and Min-Cut approximation, given that one can efficiently construct these shortcuts in a distributed manner. Unfortunately, the shortcut construction of [Ghaffari and Hauepler; SODA'16] relies heavily on having access to a genus embedding of the network. Computing such an embedding distributedly, however, is a hard problem - even for planar networks. No distributed embedding algorithm for bounded genus graphs is in sight. In this work, we side-step this problem by defining a restricted and more structured form of shortcuts and giving a novel construction algorithm which efficiently finds a shortcut which is, up to a logarithmic factor, as good as the best shortcut that exists for a given network. This new construction algorithm directly leads to an $O(D \log^{O(1)} n)$-round algorithm for solving optimization problems like MST for any topology for which good restricted shortcuts exist - without the need to compute any embedding. This includes the first efficient algorithm for bounded genus graphs.
• ### Bridging the Capacity Gap Between Interactive and One-Way Communication(1605.08792)

May 27, 2016 cs.IT, math.IT, cs.DS
We study the communication rate of coding schemes for interactive communication that transform any two-party interactive protocol into a protocol that is robust to noise. Recently, Haeupler (FOCS '14) showed that if an $\epsilon > 0$ fraction of transmissions are corrupted, adversarially or randomly, then it is possible to achieve a communication rate of $1 - \widetilde{O}(\sqrt{\epsilon})$. Furthermore, Haeupler conjectured that this rate is optimal for general input protocols. This stands in contrast to the classical setting of one-way communication in which error-correcting codes are known to achieve an optimal communication rate of $1 - \Theta(H(\epsilon)) = 1 - \widetilde{\Theta}(\epsilon)$. In this work, we show that the quadratically smaller rate loss of the one-way setting can also be achieved in interactive coding schemes for a very natural class of input protocols. We introduce the notion of average message length, or the average number of bits a party sends before receiving a reply, as a natural parameter for measuring the level of interactivity in a protocol. Moreover, we show that any protocol with average message length $\ell = \Omega(\mathrm{poly}(1/\epsilon))$ can be simulated by a protocol with optimal communication rate $1 - \Theta(H(\epsilon))$ over an oblivious adversarial channel with error fraction $\epsilon$. Furthermore, under the additional assumption of access to public shared randomness, the optimal communication rate is achieved ratelessly, i.e., the communication rate adapts automatically to the actual error rate $\epsilon$ without having to specify it in advance. This shows that the capacity gap between one-way and interactive communication can be bridged even for very small (constant in $\epsilon$) average message lengths, which are likely to be found in many applications.
• ### Breathe before Speaking: Efficient Information Dissemination Despite Noisy, Limited and Anonymous Communication(1311.3425)

June 26, 2015 cs.DC
Distributed computing models typically assume reliable communication between processors. While such assumptions often hold for engineered networks, e.g., due to underlying error correction protocols, their relevance to biological systems, wherein messages are often distorted before reaching their destination, is quite limited. In this study we take a first step towards reducing this gap by rigorously analyzing a model of communication in large anonymous populations composed of simple agents which interact through short and highly unreliable messages. We focus on the broadcast problem and the majority-consensus problem. Both are fundamental information dissemination problems in distributed computing, in which the goal of agents is to converge to some prescribed desired opinion. We initiate the study of these problems in the presence of communication noise. Our model for communication is extremely weak and follows the push gossip communication paradigm: In each round each agent that wishes to send information delivers a message to a random anonymous agent. This communication is further restricted to contain only one bit (essentially representing an opinion). Lastly, the system is assumed to be so noisy that the bit in each message sent is flipped independently with probability $1/2-\epsilon$, for some small $\epsilon >0$. Even in this severely restricted, stochastic and noisy setting we give natural protocols that solve the noisy broadcast and the noisy majority-consensus problems efficiently. Our protocols run in $O(\log n / \epsilon^2)$ rounds and use $O(n \log n / \epsilon^2)$ messages/bits in total, where $n$ is the number of agents. These bounds are asymptotically optimal and, in fact, are as fast and message efficient as if each agent would have been simultaneously informed directly by an agent that knows the prescribed desired opinion.
• ### Maximal Noise in Interactive Communication over Erasure Channels and Channels with Feedback(1501.00624)

Jan. 4, 2015 cs.DS
We provide tight upper and lower bounds on the noise resilience of interactive communication over noisy channels with feedback. In this setting, we show that the maximal fraction of noise that any robust protocol can resist is 1/3. Additionally, we provide a simple and efficient robust protocol that succeeds as long as the fraction of noise is at most 1/3 - \epsilon. Surprisingly, both bounds hold regardless of whether the parties send bits or symbols from an arbitrarily large alphabet. We also consider interactive communication over erasure channels. We provide a protocol that matches the optimal tolerable erasure rate of 1/2 - \epsilon of previous protocols (Franklin et al., CRYPTO '13) but operates in a much simpler and more efficient way. Our protocol works with an alphabet of size 4, in contrast to prior protocols in which the alphabet size grows as epsilon goes to zero. Building on the above algorithm with a fixed alphabet size, we are able to devise a protocol for binary erasure channels that tolerates erasure rates of up to 1/3 - \epsilon.
• ### Interactive Channel Capacity Revisited(1408.1467)

Nov. 3, 2014 cs.IT, math.IT, cs.DC, cs.DS
We provide the first capacity approaching coding schemes that robustly simulate any interactive protocol over an adversarial channel that corrupts any $\epsilon$ fraction of the transmitted symbols. Our coding schemes achieve a communication rate of $1 - O(\sqrt{\epsilon \log \log 1/\epsilon})$ over any adversarial channel. This can be improved to $1 - O(\sqrt{\epsilon})$ for random, oblivious, and computationally bounded channels, or if parties have shared randomness unknown to the channel. Surprisingly, these rates exceed the $1 - \Omega(\sqrt{H(\epsilon)}) = 1 - \Omega(\sqrt{\epsilon \log 1/\epsilon})$ interactive channel capacity bound which [Kol and Raz; STOC'13] recently proved for random errors. We conjecture $1 - \Theta(\sqrt{\epsilon \log \log 1/\epsilon})$ and $1 - \Theta(\sqrt{\epsilon})$ to be the optimal rates for their respective settings and therefore to capture the interactive channel capacity for random and adversarial errors. In addition to being very communication efficient, our randomized coding schemes have multiple other advantages. They are computationally efficient, extremely natural, and significantly simpler than prior (non-capacity approaching) schemes. In particular, our protocols do not employ any coding but allow the original protocol to be performed as-is, interspersed only by short exchanges of hash values. When hash values do not match, the parties backtrack. Our approach is, as we feel, by far the simplest and most natural explanation for why and how robust interactive communication in a noisy environment is possible.
• ### Consistent Weighted Sampling Made Fast, Small, and Easy(1410.4266)

Oct. 16, 2014 cs.DS
Document sketching using Jaccard similarity has been a workable effective technique in reducing near-duplicates in Web page and image search results, and has also proven useful in file system synchronization, compression and learning applications. Min-wise sampling can be used to derive an unbiased estimator for Jaccard similarity and taking a few hundred independent consistent samples leads to compact sketches which provide good estimates of pairwise-similarity. Subsequent works extended this technique to weighted sets and show how to produce samples with only a constant number of hash evaluations for any element, independent of its weight. Another improvement by Li et al. shows how to speedup sketch computations by computing many (near-)independent samples in one shot. Unfortunately this latter improvement works only for the unweighted case. In this paper we give a simple, fast and accurate procedure which reduces weighted sets to unweighted sets with small impact on the Jaccard similarity. This leads to compact sketches consisting of many (near-)independent weighted samples which can be computed with just a small constant number of hash function evaluations per weighted element. The size of the produced unweighted set is furthermore a tunable parameter which enables us to run the unweighted scheme of Li et al. in the regime where it is most efficient. Even when the sets involved are unweighted, our approach gives a simple solution to the densification problem that other works attempted to address. Unlike previously known schemes, ours does not result in an unbiased estimator. However, we prove that the bias introduced by our reduction is negligible and that the standard deviation is comparable to the unweighted case. We also empirically evaluate our scheme and show that it gives significant gains in computational efficiency, without any measurable loss in accuracy.

Aug. 28, 2014 math.CO, cs.DC, cs.DS
The broadcast throughput in a network is defined as the average number of messages that can be transmitted per unit time from a given source to all other nodes when time goes to infinity. Classical broadcast algorithms treat messages as atomic tokens and route them from the source to the receivers by making intermediate nodes store and forward messages. The more recent network coding approach, in contrast, prompts intermediate nodes to mix and code together messages. It has been shown that certain wired networks have an asymptotic network coding gap, that is, they have asymptotically higher broadcast throughput when using network coding compared to routing. Whether such a gap exists for wireless networks has been an open question of great interest. We approach this question by studying the broadcast throughput of the radio network model which has been a standard mathematical model to study wireless communication. We show that there is a family of radio networks with a tight $\Theta(\log \log n)$ network coding gap, that is, networks in which the asymptotic throughput achievable via routing messages is a $\Theta(\log \log n)$ factor smaller than that of the optimal network coding algorithm. We also provide new tight upper and lower bounds that show that the asymptotic worst-case broadcast throughput over all networks with $n$ nodes is $\Theta(1 / \log n)$ messages-per-round for both routing and network coding.
• ### Optimal Error Rates for Interactive Coding II: Efficiency and List Decoding(1312.1763)

April 16, 2014 cs.IT, math.IT, cs.DS
We study coding schemes for error correction in interactive communications. Such interactive coding schemes simulate any $n$-round interactive protocol using $N$ rounds over an adversarial channel that corrupts up to $\rho N$ transmissions. Important performance measures for a coding scheme are its maximum tolerable error rate $\rho$, communication complexity $N$, and computational complexity. We give the first coding scheme for the standard setting which performs optimally in all three measures: Our randomized non-adaptive coding scheme has a near-linear computational complexity and tolerates any error rate $\delta < 1/4$ with a linear $N = \Theta(n)$ communication complexity. This improves over prior results which each performed well in two of these measures. We also give results for other settings of interest, namely, the first computationally and communication efficient schemes that tolerate $\rho < \frac{2}{7}$ adaptively, $\rho < \frac{1}{3}$ if only one party is required to decode, and $\rho < \frac{1}{2}$ if list decoding is allowed. These are the optimal tolerable error rates for the respective settings. These coding schemes also have near linear computational and communication complexity. These results are obtained via two techniques: We give a general black-box reduction which reduces unique decoding, in various settings, to list decoding. We also show how to boost the computational and communication efficiency of any list decoder to become near linear.
• ### Fast Structuring of Radio Networks for Multi-Message Communications(1404.2387)

April 9, 2014 cs.DC, cs.NI
We introduce collision free layerings as a powerful way to structure radio networks. These layerings can replace hard-to-compute BFS-trees in many contexts while having an efficient randomized distributed construction. We demonstrate their versatility by using them to provide near optimal distributed algorithms for several multi-message communication primitives. Designing efficient communication primitives for radio networks has a rich history that began 25 years ago when Bar-Yehuda et al. introduced fast randomized algorithms for broadcasting and for constructing BFS-trees. Their BFS-tree construction time was $O(D \log^2 n)$ rounds, where $D$ is the network diameter and $n$ is the number of nodes. Since then, the complexity of a broadcast has been resolved to be $T_{BC} = \Theta(D \log \frac{n}{D} + \log^2 n)$ rounds. On the other hand, BFS-trees have been used as a crucial building block for many communication primitives and their construction time remained a bottleneck for these primitives. We introduce collision free layerings that can be used in place of BFS-trees and we give a randomized construction of these layerings that runs in nearly broadcast time, that is, w.h.p. in $T_{Lay} = O(D \log \frac{n}{D} + \log^{2+\epsilon} n)$ rounds for any constant $\epsilon>0$. We then use these layerings to obtain: (1) A randomized algorithm for gathering $k$ messages running w.h.p. in $O(T_{Lay} + k)$ rounds. (2) A randomized $k$-message broadcast algorithm running w.h.p. in $O(T_{Lay} + k \log n)$ rounds. These algorithms are optimal up to the small difference in the additive poly-logarithmic term between $T_{BC}$ and $T_{Lay}$. Moreover, they imply the first optimal $O(n \log n)$ round randomized gossip algorithm.
• ### Simple, Fast and Deterministic Gossip and Rumor Spreading(1210.1193)

April 5, 2014 math.CO, cs.DC, cs.DS
We present a randomized distributed algorithm that in radio networks with collision detection broadcasts a single message in $O(D + \log^6 n)$ rounds, with high probability. This time complexity is most interesting because of its optimal additive dependence on the network diameter $D$. It improves over the currently best known $O(D\log\frac{n}{D}\,+\,\log^2 n)$ algorithms, due to Czumaj and Rytter [FOCS 2003], and Kowalski and Pelc [PODC 2003]. These algorithms where designed for the model without collision detection and are optimal in that model. However, as explicitly stated by Peleg in his 2007 survey on broadcast in radio networks, it had remained an open question whether the bound can be improved with collision detection. We also study distributed algorithms for broadcasting $k$ messages from a single source to all nodes. This problem is a natural and important generalization of the single-message broadcast problem, but is in fact considerably more challenging and less understood. We show the following results: If the network topology is known to all nodes, then a $k$-message broadcast can be performed in $O(D + k\log n + \log^2 n)$ rounds, with high probability. If the topology is not known, but collision detection is available, then a $k$-message broadcast can be performed in $O(D + k\log n + \log^6 n)$ rounds, with high probability. The first bound is optimal and the second is optimal modulo the additive $O(\log^6 n)$ term.
We present distributed randomized leader election protocols for multi-hop radio networks that elect a leader in almost the same time $T_{BC}$ required for broadcasting a message. For the setting without collision detection, our algorithm runs with high probability in $O(D \log \frac{n}{D} + \log^3 n) \min\{\log\log n,\log \frac{n}{D}\}$ rounds on any $n$-node network with diameter $D$. Since $T_{BC} = \Theta(D \log \frac{n}{D} + \log^2 n)$ is a lower bound, our upper bound is optimal up to a factor of at most $\log \log n$ and the extra $\log n$ factor on the additive term. This algorithm is furthermore the first $O(n)$ time algorithm for this setting. Our algorithms improve over a 25 year old simulation approach of Bar-Yehuda, Goldreich and Itai with a $O(T_{BC} \log n)$ running time: In 1987 they designed a fast broadcast protocol and subsequently in 1989 they showed how it can be used to simulate one round of a single-hop network that has collision detection in $T_{BC}$ time. The prime application of this simulation was to simulate Willards single-hop leader election protocol, which elects a leader in $O(\log n)$ rounds with high probability and $O(\log \log n)$ rounds in expectation. While it was subsequently shown that Willards bounds are tight, it was unclear whether the simulation approach is optimal. Our results break this barrier and essentially remove the logarithmic slowdown over the broadcast time $T_{BC}$ by going away from the simulation approach. We also give a distributed randomized leader election algorithm for the setting with collision detection that runs in $O(D + \log n \log \log n) \cdot \min\{\log \log n, \log \frac{n}{D}\}$ rounds. This round complexity is optimal up to $O(\log \log n)$ factors and improves over a deterministic algorithm that requires $\Theta(n)$ rounds independently of the diameter $D$.
Gossip algorithms spread information by having nodes repeatedly forward information to a few random contacts. By their very nature, gossip algorithms tend to be distributed and fault tolerant. If done right, they can also be fast and message-efficient. A common model for gossip communication is the random phone call model, in which in each synchronous round each node can PUSH or PULL information to or from a random other node. For example, Karp et al. [FOCS 2000] gave algorithms in this model that spread a message to all nodes in $\Theta(\log n)$ rounds while sending only $O(\log \log n)$ messages per node on average. Recently, Avin and Els\"asser [DISC 2013], studied the random phone call model with the natural and commonly used assumption of direct addressing. Direct addressing allows nodes to directly contact nodes whose ID (e.g., IP address) was learned before. They show that in this setting, one can "break the $\log n$ barrier" and achieve a gossip algorithm running in $O(\sqrt{\log n})$ rounds, albeit while using $O(\sqrt{\log n})$ messages per node. We study the same model and give a simple gossip algorithm which spreads a message in only $O(\log \log n)$ rounds. We also prove a matching $\Omega(\log \log n)$ lower bound which shows that this running time is best possible. In particular we show that any gossip algorithm takes with high probability at least $0.99 \log \log n$ rounds to terminate. Lastly, our algorithm can be tweaked to send only $O(1)$ messages per node on average with only $O(\log n)$ bits per message. Our algorithm therefore simultaneously achieves the optimal round-, message-, and bit-complexity for this setting. As all prior gossip algorithms, our algorithm is also robust against failures. In particular, if in the beginning an oblivious adversary fails any $F$ nodes our algorithm still, with high probability, informs all but $o(F)$ surviving nodes.