• ### Communication-free Massively Distributed Graph Generation(1710.07565)

March 18, 2019 cs.DC, cs.SI, cs.DS
Analyzing massive complex networks yields promising insights about our everyday lives. Building scalable algorithms to do so is a challenging task that requires a careful analysis and an extensive evaluation. However, engineering such algorithms is often hindered by the scarcity of publicly~available~datasets. Network generators serve as a tool to alleviate this problem by providing synthetic instances with controllable parameters. However, many network generators fail to provide instances on a massive scale due to their sequential nature or resource constraints. Additionally, truly scalable network generators are few and often limited in their realism. In this work, we present novel generators for a variety of network models that are frequently used as benchmarks. By making use of pseudorandomization and divide-and-conquer schemes, our generators follow a communication-free paradigm. The resulting generators are thus embarrassingly parallel and have a near optimal scaling behavior. This allows us to generate instances of up to $2^{43}$ vertices and $2^{47}$ edges in less than 22 minutes on 32768 cores. Therefore, our generators allow new graph families to be used on an unprecedented scale.
• ### Efficient Random Sampling - Parallel, Vectorized, Cache-Efficient, and Online(1610.05141)

Oct. 17, 2016 cs.DC, cs.MS, cs.DS
We consider the problem of sampling $n$ numbers from the range $\{1,\ldots,N\}$ without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time $\mathcal{O}\left(n/p+\log p\right)$ on $p$ processors. The amount of communication between the processors is very small and independent of the sample size. We also discuss modifications needed for load balancing, reservoir sampling, online sampling, sampling with replacement, Bernoulli sampling, and vectorization on SIMD units or GPUs.
• ### Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++(1608.05634)

Aug. 19, 2016 cs.DC, cs.DS
We present the design and a first performance evaluation of Thrill -- a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more cache-friendly memory layout, and explicit memory management. In particular, Thrill uses template meta-programming to compile chains of subsequent local operations into a single binary routine without intermediate buffering and with minimal indirections. Second, Thrill uses arrays rather than multisets as its primary data structure which enables additional operations like sorting, prefix sums, window scans, or combining corresponding fields of several arrays (zipping). We compare Thrill with Apache Spark and Apache Flink using five kernels from the HiBench suite. Thrill is consistently faster and often several times faster than the other frameworks. At the same time, the source codes have a similar level of simplicity and abstraction
• ### Accelerating Local Search for the Maximum Independent Set Problem(1602.01659)

Feb. 4, 2016 cs.DS
Computing high-quality independent sets quickly is an important problem in combinatorial optimization. Several recent algorithms have shown that kernelization techniques can be used to find exact maximum independent sets in medium-sized sparse graphs, as well as high-quality independent sets in huge sparse graphs that are intractable for exact (exponential-time) algorithms. However, a major drawback of these algorithms is that they require significant preprocessing overhead, and therefore cannot be used to find a high-quality independent set quickly. In this paper, we show that performing simple kernelization techniques in an online fashion significantly boosts the performance of local search, and is much faster than pre-computing a kernel using advanced techniques. In addition, we show that cutting high-degree vertices can boost local search performance even further, especially on huge (sparse) complex networks. Our experiments show that we can drastically speed up the computation of large independent sets compared to other state-of-the-art algorithms, while also producing results that are very close to the best known solutions.
• ### Finding Near-Optimal Independent Sets at Scale(1509.00764)

Sept. 2, 2015 cs.NE, cs.SI, cs.DS
The independent set problem is NP-hard and particularly difficult to solve in large sparse graphs. In this work, we develop an advanced evolutionary algorithm, which incorporates kernelization techniques to compute large independent sets in huge sparse networks. A recent exact algorithm has shown that large networks can be solved exactly by employing a branch-and-reduce technique that recursively kernelizes the graph and performs branching. However, one major drawback of their algorithm is that, for huge graphs, branching still can take exponential time. To avoid this problem, we recursively choose vertices that are likely to be in a large independent set (using an evolutionary approach), then further kernelize the graph. We show that identifying and removing vertices likely to be in large independent sets opens up the reduction space---which not only speeds up the computation of large independent sets drastically, but also enables us to compute high-quality independent sets on much larger instances than previously reported in the literature.
• ### Graph Partitioning for Independent Sets(1502.01687)

Feb. 5, 2015 cs.NE, cs.DS
Computing maximum independent sets in graphs is an important problem in computer science. In this paper, we develop an evolutionary algorithm to tackle the problem. The core innovations of the algorithm are very natural combine operations based on graph partitioning and local search algorithms. More precisely, we employ a state-of-the-art graph partitioner to derive operations that enable us to quickly exchange whole blocks of given independent sets. To enhance newly computed offsprings we combine our operators with a local search algorithm. Our experimental evaluation indicates that we are able to outperform state-of-the-art algorithms on a variety of instances.