• ### Hamming distance completeness and sparse matrix multiplication(1711.03887)

May 3, 2018 cs.DS
We show that a broad class of $(+,\diamond)$ vector products (for binary integer functions $\diamond$) are equivalent under one-to-polylog reductions to the computation of the Hamming distance. Examples include: the dominance product, the threshold product and $\ell_{2p+1}$ distances for constant $p$. Our results imply equivalence (up to polylog factors) between complexity of computation of All Pairs: Hamming Distances, $\ell_{2p+1}$ Distances, Dominance Products and Threshold Products. As a consequence, Yuster's~(SODA'09) algorithm improves not only Matou\v{s}ek's (IPL'91), but also the results of Indyk, Lewenstein, Lipsky and Porat (ICALP'04) and Min, Kao and Zhu (COCOON'09). Furthermore, our reductions apply to the pattern matching setting, showing equivalence (up to polylog factors) between pattern matching under Hamming Distance, $\ell_{2p+1}$ Distance, Dominance Product and Threshold Product, with current best upperbounds due to results of Abrahamson (SICOMP'87), Amir and Farach (Ann.~Math.~Artif.~Intell.'91), Atallah and Duket (IPL'11), Clifford, Clifford and Iliopoulous (CPM'05) and Amir, Lipsky, Porat and Umanski (CPM'05). The resulting algorithms for $\ell_{2p+1}$ Pattern Matching and All Pairs $\ell_{2p+1}$, for $2p+1 = 3,5,7,\dots$ are new. Additionally, we show that the complexity of AllPairsHammingDistances (and thus of other aforementioned AllPairs- problems) is within polylog from the time it takes to multiply matrices $n \times (n\cdot d)$ and $(n\cdot d) \times n$, each with $(n\cdot d)$ non-zero entries. This means that the current upperbounds by Yuster (SODA'09) cannot be improved without improving the sparse matrix multiplication algorithm by Yuster and Zwick~(ACM TALG'05) and vice versa.
• ### Faster Approximate(d) Text-to-Pattern L1 Distance(1801.09159)

May 2, 2018 cs.DS
The problem of finding \emph{distance} between \emph{pattern} of length $m$ and \emph{text} of length $n$ is a typical way of generalizing pattern matching to incorporate dissimilarity score. For both Hamming and $L_1$ distances only a super linear upper bound $\widetilde{O}(n\sqrt{m})$ are known, which prompts the question of relaxing the problem: either by asking for $(1 \pm \varepsilon)$ approximate distance (every distance is reported up to a multiplicative factor), or $k$-approximated distance (distances exceeding $k$ are reported as $\infty$). We focus on $L_1$ distance, for which we show new algorithms achieving complexities respectively $\widetilde{O}(\varepsilon^{-1} n)$ and $\widetilde{O}((m+k\sqrt{m}) \cdot n/m)$. This is a significant improvement upon previous algorithms with runtime $\widetilde{O}(\varepsilon^{-2} n)$ of Lipsky and Porat [Algorithmica 2011] and $\widetilde{O}(n\sqrt{k})$ of Amir, Lipsky, Porat and Umanski [CPM 2005].
• ### Population Protocols Are Fast(1802.06872)

April 16, 2018 cs.DC, cs.DS
A population protocol describes a set of state change rules for a population of $n$ indistinguishable finite-state agents (automata), undergoing random pairwise interactions. Within this very basic framework, it is possible to resolve a number of fundamental tasks in distributed computing, including: leader election, aggregate and threshold functions on the population, such as majority computation, and plurality consensus. For the first time, we show that solutions to all of these problems can be obtained \emph{quickly} using finite-state protocols. For any input, the designed finite-state protocols converge under a fair random scheduler to an output which is correct with high probability in expected $O(\mathrm{poly} \log n)$ parallel time. In the same setting, we also show protocols which always reach a valid solution, in expected parallel time $O(n^\varepsilon)$, where the number of states of the interacting automata depends only on the choice of $\varepsilon>0$. The stated time bounds hold for \emph{any} semi-linear predicate computable in the population protocol framework. The key ingredient of our result is the decentralized design of a hierarchy of phase-clocks, which tick at different rates, with the rates of adjacent clocks separated by a factor of $\Theta(\log n)$. The construction of this clock hierarchy relies on a new protocol composition technique, combined with an adapted analysis of a self-organizing process of oscillatory dynamics. This clock hierarchy is used to provide nested synchronization primitives, which allow us to view the population in a global manner and design protocols using a high-level imperative programming language with a (limited) capacity for loops and branching instructions.
• ### A Framework for Searching in Graphs in the Presence of Errors(1804.02075)

April 5, 2018 cs.DS
We consider two types of searching models, where the goal is to design an adaptive algorithm that locates an unknown vertex in a graph by repeatedly performing queries. In the vertex-query model, each query points to a vertex $v$ and the response either admits that $v$ is the target or provides a neighbor of $v$ on a shortest path from $v$ to the target. This model has been introduced for trees by Onak and Parys [FOCS 2006] and by Emamjomeh-Zadeh et al. [STOC 2016] for arbitrary graphs. In the edge-query model, each query chooses an edge and the response reveals which endpoint of the edge is closer to the target, breaking ties arbitrarily. Our goal is to analyze solutions to these problems assuming that some responses may be erroneous. We develop a scheme for tackling such noisy models with the following line of arguments: For each of the two models, we analyze a generic strategy that assumes a fixed number of lies and give a precise bound for its length via an amortized analysis. From this, we derive bounds for both a linearly bounded error rate, where the number of errors in $T$ queries is bounded by $rT$ for some $r<1/2$, and a probabilistic model in which each response is incorrect with some probability $p<1/2$. The bounds for adversarial case turn out to be strong enough for non-adversarial scenarios as well. We obtain thus a much simpler strategy performing fewer vertex-queries than one by Emamjomeh-Zadeh et al. [STOC 2016]. For edge-queries, not studied before for general graphs, we obtain bounds that are tight up to $\log \Delta$ factors in all error models. Applying our graph-theoretic results to the setting of edge-queries for paths, we obtain a number of improvements over existing bounds for searching in a sorted array in the presence of errors, including an exponential improvement for the prefix-bounded model in unbounded domains.
• ### Almost logarithmic-time space optimal leader election in population protocols(1802.06867)

Feb. 19, 2018 cs.DC
The model of population protocols refers to a large collection of simple indistinguishable entities, frequently called agents. The agents communicate and perform computation through pairwise interactions. We study fast and space efficient leader election in population of cardinality $n$ governed by a random scheduler, where during each time step the scheduler uniformly at random selects for interaction exactly one pair of agents. We propose the first $o(\log^2 n)$-time leader election protocol. Our solution operates in expected parallel time $O(\log n\log\log n)$ which is equivalent to $O(n \log n\log\log n)$ pairwise interactions. This is the fastest currently known leader election algorithm in which each agent utilises asymptotically optimal number of $O(\log\log n)$ states. The new protocol incorporates and amalgamates successfully the power of assorted synthetic coins with variable rate phase clocks.
• ### Energy Constrained Depth First Search(1709.10146)

Feb. 17, 2018 cs.DM, cs.DC, cs.DS
Depth first search is a natural algorithmic technique for constructing a closed route that visits all vertices of a graph. The length of such route equals, in an edge-weighted tree, twice the total weight of all edges of the tree and this is asymptotically optimal over all exploration strategies. This paper considers a variant of such search strategies where the length of each route is bounded by a positive integer $B$ (e.g. due to limited energy resources of the searcher). The objective is to cover all the edges of a tree $T$ using the minimum number of routes, each starting and ending at the root and each being of length at most $B$. To this end, we analyze the following natural greedy tree traversal process that is based on decomposing a depth first search traversal into a sequence of limited length routes. Given any arbitrary depth first search traversal $R$ of the tree $T$, we cover $R$ with routes $R_1,\ldots,R_l$, each of length at most $B$ such that: $R_i$ starts at the root, reaches directly the farthest point of $R$ visited by $R_{i-1}$, then $R_i$ continues along the path $R$ as far as possible, and finally $R_i$ returns to the root. We call the above algorithm \emph{piecemeal-DFS} and we prove that it achieves the asymptotically minimal number of routes $l$, regardless of the choice of $R$. Our analysis also shows that the total length of the traversal (and thus the traversal time) of piecemeal-DFS is asymptotically minimum over all energy-constrained exploration strategies. The fact that $R$ can be chosen arbitrarily means that the exploration strategy can be constructed in an online fashion when the input tree $T$ is not known in advance. Surprisingly, our results show that depth first search is efficient for energy constrained exploration of trees, even though it is known that the same does not hold for energy constrained exploration of arbitrary graphs.
• ### Distributed Colour Reduction Revisited(1709.00901)

Sept. 4, 2017 cs.DC, cs.DS
We give a new, simple distributed algorithm for graph colouring in paths and cycles. Our algorithm is fast and self-contained, it does not need any globally consistent orientation, and it reduces the number of colours from $10^{100}$ to $3$ in three iterations.
• ### All-Pairs 2-Reachability in $\mathcal{O}(n^{\omega}\log n)$ Time(1612.08075)

July 26, 2017 cs.DS
In the $2$-reachability problem we are given a directed graph $G$ and we wish to determine if there are two (edge or vertex) disjoint paths from $u$ to $v$, for a given pair of vertices $u$ and $v$. In this paper, we present an algorithm that computes $2$-reachability information for all pairs of vertices in $\mathcal{O}(n^{\omega}\log n)$ time, where $n$ is the number of vertices and $\omega$ is the matrix multiplication exponent. Hence, we show that the running time of all-pairs $2$-reachability is only within a $\log$ factor of transitive closure. Moreover, our algorithm produces a witness (i.e., a separating edge or a separating vertex) for all pair of vertices where $2$-reachability does not hold. By processing these witnesses, we can compute all the edge- and vertex-dominator trees of $G$ in $\mathcal{O}(n^2)$ additional time, which in turn enables us to answer various connectivity queries in $\mathcal{O}(1)$ time. For instance, we can test in constant time if there is a path from $u$ to $v$ avoiding an edge $e$, for any pair of query vertices $u$ and $v$, and any query edge $e$, or if there is a path from $u$ to $v$ avoiding a vertex $w$, for any query vertices $u$, $v$, and $w$.
• ### Ergodic Effects in Token Circulation(1612.09145)

July 20, 2017 cs.DC, cs.DS
We consider a dynamical process in a network which distributes all tokens located at a node among its neighbors, in a round-robin manner. We show that in the recurrent state of this dynamics (i.e., disregarding a polynomially long initialization phase of the system), the number of particles located on a given edge, averaged over an interval of time, is tightly concentrated around the average particle density in the system. Formally, for a system of $k$ particles in a graph of $m$ edges, during any interval of length $T$, this time-averaged value is $k/m \pm \widetilde{O}(1/T)$, whenever $\gcd(m,k) = \widetilde{O}(1)$ (and so, e.g., whenever $m$ is a prime number). To achieve these bounds, we link the behavior of the studied dynamics to ergodic properties of traversals based on Eulerian circuits on a symmetric directed graph. These results are proved through sum set methods and are likely to be of independent interest. As a corollary, we also obtain bounds on the idleness of the studied dynamics, i.e., on the longest possible time between two consecutive appearances of a token on an edge, taken over all edges. Designing trajectories for $k$ tokens in a way which minimizes idleness is fundamental to the study of the patrolling problem in networks. Our results immediately imply a bound of $\widetilde{O}(m/k)$ on the idleness of the studied process, showing that it is a distributed $\widetilde{O}(1)$-competitive solution to the patrolling task, for all of the covered cases. Our work also provides some further insights may be interesting in load-balancing applications.
• ### Robust Detection in Leak-Prone Population Protocols(1706.09937)

Aug. 16, 2019 cs.DC, cs.DS
In contrast to electronic computation, chemical computation is noisy and susceptible to a variety of sources of error, which has prevented the construction of robust complex systems. To be effective, chemical algorithms must be designed with an appropriate error model in mind. Here we consider the model of chemical reaction networks that preserve molecular count (population protocols), and ask whether computation can be made robust to a natural model of unintended "leak" reactions. Our definition of leak is motivated by both the particular spurious behavior seen when implementing chemical reaction networks with DNA strand displacement cascades, as well as the unavoidable side reactions in any implementation due to the basic laws of chemistry. We develop a new "Robust Detection" algorithm for the problem of fast (logarithmic time) single molecule detection, and prove that it is robust to this general model of leaks. Besides potential applications in single molecule detection, the error-correction ideas developed here might enable a new class of robust-by-design chemical algorithms. Our analysis is based on a non-standard hybrid argument, combining ideas from discrete analysis of population protocols with classic Markov chain techniques.
• ### LCL problems on grids(1702.05456)

May 24, 2017 cs.DC, cs.DS, cs.CC
LCLs or locally checkable labelling problems (e.g. maximal independent set, maximal matching, and vertex colouring) in the LOCAL model of computation are very well-understood in cycles (toroidal 1-dimensional grids): every problem has a complexity of $O(1)$, $\Theta(\log^* n)$, or $\Theta(n)$, and the design of optimal algorithms can be fully automated. This work develops the complexity theory of LCL problems for toroidal 2-dimensional grids. The complexity classes are the same as in the 1-dimensional case: $O(1)$, $\Theta(\log^* n)$, and $\Theta(n)$. However, given an LCL problem it is undecidable whether its complexity is $\Theta(\log^* n)$ or $\Theta(n)$ in 2-dimensional grids. Nevertheless, if we correctly guess that the complexity of a problem is $\Theta(\log^* n)$, we can completely automate the design of optimal algorithms. For any problem we can find an algorithm that is of a normal form $A' \circ S_k$, where $A'$ is a finite function, $S_k$ is an algorithm for finding a maximal independent set in $k$th power of the grid, and $k$ is a constant. Finally, partially with the help of automated design tools, we classify the complexity of several concrete LCL problems related to colourings and orientations.
• ### Optimal trade-offs for pattern matching with $k$ mismatches(1704.01311)

April 5, 2017 cs.DS
Given a pattern of length $m$ and a text of length $n$, the goal in $k$-mismatch pattern matching is to compute, for every $m$-substring of the text, the exact Hamming distance to the pattern or report that it exceeds $k$. This can be solved in either $\widetilde{O}(n \sqrt{k})$ time as shown by Amir et al. [J. Algorithms 2004] or $\widetilde{O}((m + k^2) \cdot n/m)$ time due to a result of Clifford et al. [SODA 2016]. We provide a smooth time trade-off between these two bounds by designing an algorithm working in time $\widetilde{O}( (m + k \sqrt{m}) \cdot n/m)$. We complement this with a matching conditional lower bound, showing that a significantly faster combinatorial algorithm is not possible, unless the combinatorial matrix multiplication conjecture fails.
• ### A note on distance labeling in planar graphs(1611.06529)

Nov. 20, 2016 cs.DS
A distance labeling scheme is an assignments of labels, that is binary strings, to all nodes of a graph, so that the distance between any two nodes can be computed from their labels and the labels are as short as possible. A major open problem is to determine the complexity of distance labeling in unweighted and undirected planar graphs. It is known that, in such a graph on $n$ nodes, some labels must consist of $\Omega(n^{1/3})$ bits, but the best known labeling scheme uses labels of length $O(\sqrt{n}\log n)$ [Gavoille, Peleg, P\'erennes, and Raz, J. Algorithms, 2004]. We show that, in fact, labels of length $O(\sqrt{n})$ are enough.
• ### Tight Tradeoffs for Real-Time Approximation of Longest Palindromes in Streams(1610.03125)

Oct. 10, 2016 cs.DS
We consider computing a longest palindrome in the streaming model, where the symbols arrive one-by-one and we do not have random access to the input. While computing the answer exactly using sublinear space is not possible in such a setting, one can still hope for a good approximation guarantee. Our contribution is twofold. First, we provide lower bounds on the space requirements for randomized approximation algorithms processing inputs of length $n$. We rule out Las Vegas algorithms, as they cannot achieve sublinear space complexity. For Monte Carlo algorithms, we prove a lower bounds of $\Omega( M \log\min\{|\Sigma|,M\})$ bits of memory; here $M=n/E$ for approximating the answer with additive error $E$, and $M= \frac{\log n}{\log (1+\varepsilon)}$ for approximating the answer with multiplicative error $(1 + \varepsilon)$. Second, we design three real-time algorithms for this problem. Our Monte Carlo approximation algorithms for both additive and multiplicative versions of the problem use $O(M)$ words of memory. Thus the obtained lower bounds are asymptotically tight up to a logarithmic factor. The third algorithm is deterministic and finds a longest palindrome exactly if it is short. This algorithm can be run in parallel with a Monte Carlo algorithm to obtain better results in practice. Overall, both the time and space complexity of finding a longest palindrome in a stream are essentially settled.
• ### Sublinear-Space Distance Labeling using Hubs(1507.06240)

July 20, 2016 cs.DS
A distance labeling scheme is an assignment of bit-labels to the vertices of an undirected, unweighted graph such that the distance between any pair of vertices can be decoded solely from their labels. We propose a series of new labeling schemes within the framework of so-called hub labeling (HL, also known as landmark labeling or 2-hop-cover labeling), in which each node $u$ stores its distance to all nodes from an appropriately chosen set of hubs $S(u) \subseteq V$. For a queried pair of nodes $(u,v)$, the length of a shortest $u-v$-path passing through a hub node from $S(u)\cap S(v)$ is then used as an upper bound on the distance between $u$ and $v$. We present a hub labeling which allows us to decode exact distances in sparse graphs using labels of size sublinear in the number of nodes. For graphs with at most $n$ nodes and average degree $\Delta$, the tradeoff between label bit size $L$ and query decoding time $T$ for our approach is given by $L = O(n \log \log_\Delta T / \log_\Delta T)$, for any $T \leq n$. Our simple approach is thus the first sublinear-space distance labeling for sparse graphs that simultaneously admits small decoding time (for constant $\Delta$, we can achieve any $T=\omega(1)$ while maintaining $L=o(n)$), and it also provides an improvement in terms of label size with respect to previous slower approaches. By using similar techniques, we then present a $2$-additive labeling scheme for general graphs, i.e., one in which the decoder provides a 2-additive-approximation of the distance between any pair of nodes. We achieve almost the same label size-time tradeoff $L = O(n \log^2 \log T / \log T)$, for any $T \leq n$. To our knowledge, this is the first additive scheme with constant absolute error to use labels of sublinear size. The corresponding decoding time is then small (any $T=\omega(1)$ is sufficient).
• ### Randomized algorithms for finding a majority element(1603.01583)

April 28, 2016 cs.DS
Given $n$ colored balls, we want to detect if more than $\lfloor n/2\rfloor$ of them have the same color, and if so find one ball with such majority color. We are only allowed to choose two balls and compare their colors, and the goal is to minimize the total number of such operations. A well-known exercise is to show how to find such a ball with only $2n$ comparisons while using only a logarithmic number of bits for bookkeeping. The resulting algorithm is called the Boyer--Moore majority vote algorithm. It is known that any deterministic method needs $\lceil 3n/2\rceil-2$ comparisons in the worst case, and this is tight. However, it is not clear what is the required number of comparisons if we allow randomization. We construct a randomized algorithm which always correctly finds a ball of the majority color (or detects that there is none) using, with high probability, only $7n/6+o(n)$ comparisons. We also prove that the expected number of comparisons used by any such randomized method is at least $1.019n$.
• ### Tight tradeoffs for approximating palindromes in streams(1410.6433)

April 7, 2016 cs.DS
We consider computing the longest palindrome in a text of length $n$ in the streaming model, where the characters arrive one-by-one, and we do not have random access to the input. While computing the answer exactly using sublinear memory is not possible in such a setting, one can still hope for a good approximation guarantee. We focus on the two most natural variants, where we aim for either additive or multiplicative approximation of the length of the longest palindrome. We first show that there is no point in considering Las Vegas algorithms in such a setting, as they cannot achieve sublinear space complexity. For Monte Carlo algorithms, we provide a lowerbound of $\Omega(\frac{n}{E})$ bits for approximating the answer with additive error $E$, and $\Omega(\frac{\log n}{\log(1+\varepsilon)})$ bits for approximating the answer with multiplicative error $(1+\varepsilon)$ for the binary alphabet. Then, we construct a generic Monte Carlo algorithm, which by choosing the parameters appropriately achieves space complexity matching up to a logarithmic factor for both variants. This substantially improves the previous results by Berenbrink et al. (STACS 2014) and essentially settles the space complexity.
• ### Time and space optimality of rotor-router graph exploration(1502.05545)

Nov. 29, 2015 cs.DS
We consider the problem of exploration of an anonymous, port-labeled, undirected graph with $n$ nodes and $m$ edges and diameter $D$, by a single mobile agent. Initially the agent does not know the graph topology nor any of the global parameters. Moreover, the agent does not know the incoming port when entering to a vertex. Each vertex is endowed with memory that can be read and modified by the agent upon its visit to that node. However the agent has no operational memory i.e., it cannot carry any state while traversing an edge. In such a model at least $\log_2 d$ bits are needed at each vertex of degree $d$ for the agent to be able to traverse each graph edge. This number of bits is always sufficient to explore any graph in time $O(mD)$ using algorithm Rotor-Router. We show that even if the available node memory is unlimited then time $\Omega(n^3)$ is sometimes required for any algorithm. This shows that Rotor-Router is asymptotically optimal in the worst-case graphs. Secondly we show that for the case of the path the Rotor-Router attains exactly optimal time.
• ### Prime Factorization of the Kirchhoff Polynomial: Compact Enumeration of Arborescences(1507.07727)

July 28, 2015 cs.DM, cs.DS
We study the problem of enumerating all rooted directed spanning trees (arborescences) of a directed graph (digraph) $G=(V,E)$ of $n$ vertices. An arborescence $A$ consisting of edges $e_1,\ldots,e_{n-1}$ can be represented as a monomial $e_1\cdot e_2 \cdots e_{n-1}$ in variables $e \in E$. All arborescences $\mathsf{arb}(G)$ of a digraph then define the Kirchhoff polynomial $\sum_{A \in \mathsf{arb}(G)} \prod_{e\in A} e$. We show how to compute a compact representation of the Kirchhoff polynomial -- its prime factorization, and how it relates to combinatorial properties of digraphs such as strong connectivity and vertex domination. In particular, we provide digraph decomposition rules that correspond to factorization steps of the polynomial, and also give necessary and sufficient primality conditions of the resulting factors expressed by connectivity properties of the corresponding decomposed components. Thereby, we obtain a linear time algorithm for decomposing a digraph into components corresponding to factors of the initial polynomial, and a guarantee that no finer factorization is possible. The decomposition serves as a starting point for a recursive deletion-contraction algorithm, and also as a preprocessing phase for iterative enumeration algorithms. Both approaches produce a compressed output and retain some structural properties in the resulting polynomial. This proves advantageous in practical applications such as calculating steady states on digraphs governed by Laplacian dynamics, or computing the greatest common divisor of Kirchhoff polynomials. Finally, we initiate the study of a class of digraphs which allow for a practical enumeration of arborescences. Using our decomposition rules we observe that various digraphs from real-world applications fall into this class or are structurally similar to it.
• ### All Permutations Supersequence is coNP-complete(1506.05079)

July 9, 2015 cs.CC
We prove that deciding whether a given input word contains as subsequence every possible permutation of integers $\{1,2,\ldots,n\}$ is coNP-complete. The coNP-completeness holds even when given the guarantee that the input word contains as subsequences all of length $n-1$ sequences over the same set of integers. We also show NP-completeness of a related problem of Partially Non-crossing Perfect Matching in Bipartite Graphs, i.e. to find a perfect matching in an ordered bipartite graph where edges of the matching incident to selected vertices (even only from one side) are non-crossing.