• ### A Complexity-Based Hierarchy for Multiprocessor Synchronization(1607.06139)

May 4, 2018 cs.DC
For many years, Herlihy's elegant computability based Consensus Hierarchy has been our best explanation of the relative power of various types of multiprocessor synchronization objects when used in deterministic algorithms. However, key to this hierarchy is treating synchronization instructions as distinct objects, an approach that is far from the real-world, where multiprocessor programs apply synchronization instructions to collections of arbitrary memory locations. We were surprised to realize that, when considering instructions applied to memory locations, the computability based hierarchy collapses. This leaves open the question of how to better capture the power of various synchronization instructions. In this paper, we provide an approach to answering this question. We present a hierarchy of synchronization instructions, classified by their space complexity in solving obstruction-free consensus. Our hierarchy provides a classification of combinations of known instructions that seems to fit with our intuition of how useful some are in practice, while questioning the effectiveness of others. We prove an essentially tight characterization of the power of buffered read and write instructions.Interestingly, we show a similar result for multi-location atomic assignments.
• ### Revisionist Simulations: A New Approach to Proving Space Lower Bounds(1711.02455)

Feb. 26, 2018 cs.DC, cs.DS, cs.CC
Determining the space complexity of $x$-obstruction-free $k$-set agreement for $x\leq k$ is an open problem. In $x$-obstruction-free protocols, processes are required to return in executions where at most $x$ processes take steps. The best known upper bound on the number of registers needed to solve this problem among $n>k$ processes is $n-k+x$ registers. No general lower bound better than $2$ was known. We prove that any $x$-obstruction-free protocol solving $k$-set agreement among $n>k$ processes uses at least $\lfloor(n-x)/(k+1-x)\rfloor+1$ registers. Our main tool is a simulation that serves as a reduction from the impossibility of deterministic wait-free $k$-set agreement: if a protocol uses fewer registers, then it is possible for $k+1$ processes to simulate the protocol and deterministically solve $k$-set agreement in a wait-free manner, which is impossible. A critical component of the simulation is the ability of simulating processes to revise the past of simulated processes. We introduce a new augmented snapshot object, which facilitates this. We also prove that any space lower bound on the number of registers used by obstruction-free protocols applies to protocols that satisfy nondeterministic solo termination. Hence, our lower bound of $\lfloor(n-1)/k\rfloor+1$ for the obstruction-free ($x=1$) case also holds for randomized wait-free free protocols. In particular, this gives a tight lower bound of exactly $n$ registers for solving obstruction-free and randomized wait-free consensus. Finally, our new techniques can be applied to get a space lower of $\lfloor n/2\rfloor+1$ for $\epsilon$-approximate agreement, for sufficiently small $\epsilon$. It requires participating processes to return values within $\epsilon$ of each other. The best known upper bounds are $\lceil\log(1/\epsilon)\rceil$ and $n$, while no general lower bounds were known.
• ### Space-Optimal Majority in Population Protocols(1704.04947)

July 13, 2017 cs.DC, cs.DS
Population protocols are a model of distributed computing, in which $n$ agents with limited local state interact randomly, and cooperate to collectively compute global predicates. An extensive series of papers, across different communities, has examined the computability and complexity characteristics of this model. Majority, or consensus, is a central task, in which agents need to collectively reach a decision as to which one of two states $A$ or $B$ had a higher initial count. Two complexity metrics are important: the time that a protocol requires to stabilize to an output decision, and the state space size that each agent requires. It is known that majority requires $\Omega(\log \log n)$ states per agent to allow for poly-logarithmic time stabilization, and that $O(\log^2 n)$ states are sufficient. Thus, there is an exponential gap between the upper and lower bounds. We address this question. We provide a new lower bound of $\Omega(\log n)$ states for any protocol which stabilizes in $O( n^{1-c} )$ time, for any $c > 0$ constant. This result is conditional on basic monotonicity and output assumptions, satisfied by all known protocols. Technically, it represents a significant departure from previous lower bounds. Instead of relying on dense configurations, we introduce a new surgery technique to construct executions which contradict the correctness of algorithms that stabilize too fast. Subsequently, our lower bound applies to general initial configurations. We give an algorithm for majority which uses $O(\log n)$ states, and stabilizes in $O(\log^2 n)$ time. Central to the algorithm is a new leaderless phase clock, which allows nodes to synchronize in phases of $\Theta(n \log{n})$ consecutive interactions using $O(\log n)$ states per node. We also employ our phase clock to build a leader election algorithm with $O(\log n )$ states, which stabilizes in $O(\log^2 n)$ time.
• ### Towards Reduced Instruction Sets for Synchronization(1705.02808)

May 8, 2017 cs.DC
Contrary to common belief, a recent work by Ellen, Gelashvili, Shavit, and Zhu has shown that computability does not require multicore architectures to support "strong" synchronization instructions like compare-and-swap, as opposed to combinations of "weaker" instructions like decrement and multiply. However, this is the status quo, and in turn, most efficient concurrent data-structures heavily rely on compare-and-swap (e.g. for swinging pointers and in general, conflict resolution). We show that this need not be the case, by designing and implementing a concurrent linearizable Log data-structure (also known as a History object), supporting two operations: append(item), which appends the item to the log, and get-log(), which returns the appended items so far, in order. Readers are wait-free and writers are lock-free, and this data-structure can be used in a lock-free universal construction to implement any concurrent object with a given sequential specification. Our implementation uses atomic read, xor, decrement, and fetch-and-increment instructions supported on X86 architectures, and provides similar performance to a compare-and-swap-based solution on today's hardware. This raises a fundamental question about minimal set of synchronization instructions that the architectures have to support.
• ### Time-Space Trade-offs in Population Protocols(1602.08032)

April 17, 2017 cs.DC
Population protocols are a popular model of distributed computing, in which randomly-interacting agents with little computational power cooperate to jointly perform computational tasks. Inspired by developments in molecular computation, and in particular DNA computing, recent algorithmic work has focused on the complexity of solving simple yet fundamental tasks in the population model, such as leader election (which requires stabilization to a single agent in a special "leader" state), and majority (in which agents must stabilize to a decision as to which of two possible initial states had higher initial count). Known results point towards an inherent trade-off between the time complexity of such algorithms, and the space complexity, i.e. size of the memory available to each agent. In this paper, we explore this trade-off and provide new upper and lower bounds for majority and leader election. First, we prove a unified lower bound, which relates the space available per node with the time complexity achievable by a protocol: for instance, our result implies that any protocol solving either of these tasks for $n$ agents using $O( \log \log n )$ states must take $\Omega( n / \rm{polylog} n )$ expected time. This is the first result to characterize time complexity for protocols which employ super-constant number of states per node, and proves that fast, poly-logarithmic running times require protocols to have relatively large space costs. On the positive side, we give algorithms showing that fast, poly-logarithmic stabilization time can be achieved using $O( \log^2 n )$ space per node, in the case of both tasks. Overall, our results highlight a time complexity separation between $O(\log \log n)$ and $\Theta( \log^2 n )$ state space size for both majority and leader election in population protocols, and introduce new techniques, which should be applicable more broadly.
• ### Polylogarithmic-Time Leader Election in Population Protocols Using Polylogarithmic States(1502.05745)

April 16, 2017 cs.DC
• ### On the Optimal Space Complexity of Consensus for Anonymous Processes(1506.06817)

Aug. 17, 2015 cs.DC
The optimal space complexity of consensus in shared memory is a decades-old open problem. For a system of $n$ processes, no algorithm is known that uses a sublinear number of registers. However, the best known lower bound due to Fich, Herlihy, and Shavit requires $\Omega(\sqrt{n})$ registers. The special symmetric case of the problem where processes are anonymous (run the same algorithm) has also attracted attention. Even in this case, the best lower and upper bounds are still $\Omega(\sqrt{n})$ and $O(n)$. Moreover, Fich, Herlihy, and Shavit first proved their lower bound for anonymous processes, and then extended it to the general case. As such, resolving the anonymous case might be a significant step towards understanding and solving the general problem. In this work, we show that in a system of anonymous processes, any consensus algorithm satisfying nondeterministic solo termination has to use $\Omega(n)$ read-write registers in some execution. This implies an $\Omega(n)$ lower bound on the space complexity of deterministic obstruction-free and randomized wait-free consensus, matching the upper bound and closing the symmetric case of the open problem.
• ### Restricted Isometry Property for General p-Norms(1407.2178)

Feb. 22, 2015 math.NA, cs.IT, math.IT, math.PR, cs.DM, cs.DS
The Restricted Isometry Property (RIP) is a fundamental property of a matrix which enables sparse recovery. Informally, an $m \times n$ matrix satisfies RIP of order $k$ for the $\ell_p$ norm, if $\|Ax\|_p \approx \|x\|_p$ for every vector $x$ with at most $k$ non-zero coordinates. For every $1 \leq p < \infty$ we obtain almost tight bounds on the minimum number of rows $m$ necessary for the RIP property to hold. Prior to this work, only the cases $p = 1$, $1 + 1 / \log k$, and $2$ were studied. Interestingly, our results show that the case $p = 2$ is a "singularity" point: the optimal number of rows $m$ is $\widetilde{\Theta}(k^{p})$ for all $p\in [1,\infty)\setminus \{2\}$, as opposed to $\widetilde{\Theta}(k)$ for $k=2$. We also obtain almost tight bounds for the column sparsity of RIP matrices and discuss implications of our results for the Stable Sparse Recovery problem.
• ### How to Elect a Leader Faster than a Tournament(1411.1001)

Feb. 15, 2015 cs.DC
The problem of electing a leader from among $n$ contenders is one of the fundamental questions in distributed computing. In its simplest formulation, the task is as follows: given $n$ processors, all participants must eventually return a win or lose indication, such that a single contender may win. Despite a considerable amount of work on leader election, the following question is still open: can we elect a leader in an asynchronous fault-prone system faster than just running a $\Theta(\log n)$-time tournament, against a strong adaptive adversary? In this paper, we answer this question in the affirmative, improving on a decades-old upper bound. We introduce two new algorithmic ideas to reduce the time complexity of electing a leader to $O(\log^* n)$, using $O(n^2)$ point-to-point messages. A non-trivial application of our algorithm is a new upper bound for the tight renaming problem, assigning $n$ items to the $n$ participants in expected $O(\log^2 n)$ time and $O(n^2)$ messages. We complement our results with lower bound of $\Omega(n^2)$ messages for solving these two problems, closing the question of their message complexity.
• ### Johnson-Lindenstrauss Compression with Neuroscience-Based Constraints(1411.5383)

Nov. 19, 2014 math.PR, q-bio.NC, math.ST, stat.TH, cs.DS
Johnson-Lindenstrauss (JL) matrices implemented by sparse random synaptic connections are thought to be a prime candidate for how convergent pathways in the brain compress information. However, to date, there is no complete mathematical support for such implementations given the constraints of real neural tissue. The fact that neurons are either excitatory or inhibitory implies that every so implementable JL matrix must be sign-consistent (i.e., all entries in a single column must be either all non-negative or all non-positive), and the fact that any given neuron connects to a relatively small subset of other neurons implies that the JL matrix had better be sparse. We construct sparse JL matrices that are sign-consistent, and prove that our construction is essentially optimal. Our work answers a mathematical question that was triggered by earlier work and is necessary to justify the existence of JL compression in the brain, and emphasizes that inhibition is crucial if neurons are to perform efficient, correlation-preserving compression.
• ### On the Importance of Registers for Computability(1411.0168)

Nov. 1, 2014 cs.DC, cs.DS
All consensus hierarchies in the literature assume that we have, in addition to copies of a given object, an unbounded number of registers. But why do we really need these registers? This paper considers what would happen if one attempts to solve consensus using various objects but without any registers. We show that under a reasonable assumption, objects like queues and stacks cannot emulate the missing registers. We also show that, perhaps surprisingly, initialization, shown to have no computational consequences when registers are readily available, is crucial in determining the synchronization power of objects when no registers are allowed. Finally, we show that without registers, the number of available objects affects the level of consensus that can be solved. Our work thus raises the question of whether consensus hierarchies which assume an unbounded number of registers truly capture synchronization power, and begins a line of research aimed at better understanding the interaction between read-write memory and the powerful synchronization operations available on modern architectures.