• Longest Common Subsequence ($LCS$) deals with the problem of measuring similarity of two strings. While this problem has been analyzed for decades, the recent interest stems from a practical observation that considering single characters is often too simplistic. Therefore, recent works introduce the variants of $LCS$ based on shared substrings of length exactly or at least $k$ ($LCS_k$ and $LCS_{k+}$ respectively). The main drawback of the state-of-the-art algorithms for computing $LCS_k$ and $LCS_{k+}$ is that they work well only in a limited setting: they either solve the average case well while being suboptimal in the pathological situations or they achieve a good worst-case performance, but fail to exploit the input data properties to speed up the computation. Furthermore, these algorithms are based on non-trivial data structures which is not ideal from a practitioner's point of view. We present a single algorithm to compute both $LCS_k$ and $LCS_{k+}$ which outperforms the state-of-the art algorithms in terms of runtime complexity and requires only basic data structures. In addition, we implement an algorithm to reconstruct the solution which offers significant improvement in terms of memory consumption. Our empirical validation shows that we save several orders of magnitude of memory on human genome data. The C++ implementation of our algorithms is made available at: https://github.com/google/fast-simple-lcsk
  • The radio network model is a well-studied abstraction for modeling wireless multi-hop networks. However, radio networks make the strong assumption that messages are delivered deterministically. The recently introduced noisy radio network model relaxes this assumption by dropping messages independently at random. In this work we quantify the relative computational power of noisy radio networks and classic radio networks. In particular, given a protocol for a fixed radio network we show how to reliably simulate this protocol if noise is introduced with a multiplicative cost of $\mathrm{poly}(\log \Delta, \log \log n)$ rounds. For this result we make the simplifying assumption that the simulated protocol is $\textit{static}$. Moreover, we demonstrate that, even if the simulated protocol is not static, it can be simulated with a multiplicative $O(\Delta \log \Delta)$ cost in the number of rounds. Hence, our results show that protocols on constant-degree networks can be made robust to random noise with constant multiplicative overhead. Lastly, we argue that simulations with a multiplicative overhead of $o(\log \Delta)$ are unlikely to exist by proving that an $\Omega(\log \Delta)$ multiplicative round overhead is necessary under certain natural assumptions.
  • Distributed network optimization algorithms, such as minimum spanning tree, minimum cut, and shortest path, are an active research area in distributed computing. This paper presents a fast distributed algorithm for such problems in the CONGEST model, on networks that exclude a fixed minor. On general graphs, many optimization problems, including the ones mentioned above, require $\tilde\Omega(\sqrt n)$ rounds of communication in the CONGEST model, even if the network graph has a much smaller diameter. Naturally, the next step in algorithm design is to design efficient algorithms which bypass this lower bound on a restricted class of graphs. Currently, the only known method of doing so uses the low-congestion shortcut framework of Ghaffari and Haeupler [SODA'16]. Building off of their work, this paper proves that excluded minor graphs admit high-quality shortcuts, leading to an $\tilde O(D^2)$ round algorithm for the aforementioned problems, where $D$ is the diameter of the network graph. To work with excluded minor graph families, we utilize the Graph Structure Theorem of Robertson and Seymour. To the best of our knowledge, this is the first time the Graph Structure Theorem has been used for an algorithmic result in the distributed setting. Even though the proof is involved, merely showing the existence of good shortcuts is sufficient to obtain simple, efficient distributed algorithms. In particular, the shortcut framework can efficiently construct near-optimal shortcuts and then use them to solve the optimization problems. This, combined with the very general family of excluded minor graphs, which includes most other important graph classes, makes this result of significant interest.
  • The widely-studied radio network model [Chlamtac and Kutten, 1985] is a graph-based description that captures the inherent impact of collisions in wireless communication. In this model, the strong assumption is made that node $v$ receives a message from a neighbor if and only if exactly one of its neighbors broadcasts. We relax this assumption by introducing a new noisy radio network model in which random faults occur at senders or receivers. Specifically, for a constant noise parameter $p \in [0,1)$, either every sender has probability $p$ of transmitting noise or every receiver of a single transmission in its neighborhood has probability $p$ of receiving noise. We first study single-message broadcast algorithms in noisy radio networks and show that the Decay algorithm [Bar-Yehuda et al., 1992] remains robust in the noisy model while the diameter-linear algorithm of Gasieniec et al., 2007 does not. We give a modified version of the algorithm of Gasieniec et al., 2007 that is robust to sender and receiver faults, and extend both this modified algorithm and the Decay algorithm to robust multi-message broadcast algorithms. We next investigate the extent to which (network) coding improves throughput in noisy radio networks. We address the previously perplexing result of Alon et al. 2014 that worst case coding throughput is no better than worst case routing throughput up to constants: we show that the worst case throughput performance of coding is, in fact, superior to that of routing -- by a $\Theta(\log(n))$ gap -- provided receiver faults are introduced. However, we show that any coding or routing scheme for the noiseless setting can be transformed to be robust to sender faults with only a constant throughput overhead. These transformations imply that the results of Alon et al., 2014 carry over to noisy radio networks with sender faults.
  • Distributed optimization algorithms are frequently faced with solving sub-problems on disjoint connected parts of a network. Unfortunately, the diameter of these parts can be significantly larger than the diameter of the underlying network, leading to slow running times. Recent work by [Ghaffari and Hauepler; SODA'16] showed that this phenomenon can be seen as the broad underlying reason for the pervasive $\Omega(\sqrt{n} + D)$ lower bounds that apply to most optimization problems in the CONGEST model. On the positive side, this work also introduced low-congestion shortcuts as an elegant solution to circumvent this problem in certain topologies of interest. Particularly, they showed that there exist good shortcuts for any planar network and more generally any bounded genus network. This directly leads to fast $O(D \log^{O(1)} n)$ distributed algorithms for MST and Min-Cut approximation, given that one can efficiently construct these shortcuts in a distributed manner. Unfortunately, the shortcut construction of [Ghaffari and Hauepler; SODA'16] relies heavily on having access to a genus embedding of the network. Computing such an embedding distributedly, however, is a hard problem - even for planar networks. No distributed embedding algorithm for bounded genus graphs is in sight. In this work, we side-step this problem by defining a restricted and more structured form of shortcuts and giving a novel construction algorithm which efficiently finds a shortcut which is, up to a logarithmic factor, as good as the best shortcut that exists for a given network. This new construction algorithm directly leads to an $O(D \log^{O(1)} n)$-round algorithm for solving optimization problems like MST for any topology for which good restricted shortcuts exist - without the need to compute any embedding. This includes the first efficient algorithm for bounded genus graphs.
  • In this paper we present $LCSk$++: a new metric for measuring the similarity of long strings, and provide an algorithm for its efficient computation. With ever increasing size of strings occuring in practice, e.g. large genomes of plants and animals, classic algorithms such as Longest Common Subsequence (LCS) fail due to demanding computational complexity. Recently, Benson et al. defined a similarity metric named $LCSk$. By relaxing the requirement that the $k$-length substrings should not overlap, we extend their definition into a new metric. An efficient algorithm is presented which computes $LCSk$++ with complexity of $O((|X|+|Y|)\log(|X|+|Y|))$ for strings $X$ and $Y$ under a realistic random model. The algorithm has been designed with implementation simplicity in mind. Additionally, we describe how it can be adjusted to compute $LCSk$ as well, which gives an improvement of the $O(|X|\dot|Y|)$ algorithm presented in the original $LCSk$ paper.