
Analyzing massive complex networks yields promising insights about our
everyday lives. Building scalable algorithms to do so is a challenging task
that requires a careful analysis and an extensive evaluation. However,
engineering such algorithms is often hindered by the scarcity of
publicly~available~datasets.
Network generators serve as a tool to alleviate this problem by providing
synthetic instances with controllable parameters. However, many network
generators fail to provide instances on a massive scale due to their sequential
nature or resource constraints. Additionally, truly scalable network generators
are few and often limited in their realism.
In this work, we present novel generators for a variety of network models
that are frequently used as benchmarks. By making use of pseudorandomization
and divideandconquer schemes, our generators follow a communicationfree
paradigm. The resulting generators are thus embarrassingly parallel and have a
near optimal scaling behavior. This allows us to generate instances of up to
$2^{43}$ vertices and $2^{47}$ edges in less than 22 minutes on 32768 cores.
Therefore, our generators allow new graph families to be used on an
unprecedented scale.

We consider the problem of sampling $n$ numbers from the range
$\{1,\ldots,N\}$ without replacement on modern architectures. The main result
is a simple divideandconquer scheme that makes sequential algorithms more
cache efficient and leads to a parallel algorithm running in expected time
$\mathcal{O}\left(n/p+\log p\right)$ on $p$ processors. The amount of
communication between the processors is very small and independent of the
sample size. We also discuss modifications needed for load balancing, reservoir
sampling, online sampling, sampling with replacement, Bernoulli sampling, and
vectorization on SIMD units or GPUs.

We present the design and a first performance evaluation of Thrill  a
prototype of a general purpose big data processing framework with a convenient
dataflow style programming interface. Thrill is somewhat similar to Apache
Spark and Apache Flink with at least two main differences. First, Thrill is
based on C++ which enables performance advantages due to direct native code
compilation, a more cachefriendly memory layout, and explicit memory
management. In particular, Thrill uses template metaprogramming to compile
chains of subsequent local operations into a single binary routine without
intermediate buffering and with minimal indirections. Second, Thrill uses
arrays rather than multisets as its primary data structure which enables
additional operations like sorting, prefix sums, window scans, or combining
corresponding fields of several arrays (zipping). We compare Thrill with Apache
Spark and Apache Flink using five kernels from the HiBench suite. Thrill is
consistently faster and often several times faster than the other frameworks.
At the same time, the source codes have a similar level of simplicity and
abstraction

Computing highquality independent sets quickly is an important problem in
combinatorial optimization. Several recent algorithms have shown that
kernelization techniques can be used to find exact maximum independent sets in
mediumsized sparse graphs, as well as highquality independent sets in huge
sparse graphs that are intractable for exact (exponentialtime) algorithms.
However, a major drawback of these algorithms is that they require significant
preprocessing overhead, and therefore cannot be used to find a highquality
independent set quickly.
In this paper, we show that performing simple kernelization techniques in an
online fashion significantly boosts the performance of local search, and is
much faster than precomputing a kernel using advanced techniques. In addition,
we show that cutting highdegree vertices can boost local search performance
even further, especially on huge (sparse) complex networks. Our experiments
show that we can drastically speed up the computation of large independent sets
compared to other stateoftheart algorithms, while also producing results
that are very close to the best known solutions.

The independent set problem is NPhard and particularly difficult to solve in
large sparse graphs. In this work, we develop an advanced evolutionary
algorithm, which incorporates kernelization techniques to compute large
independent sets in huge sparse networks. A recent exact algorithm has shown
that large networks can be solved exactly by employing a branchandreduce
technique that recursively kernelizes the graph and performs branching.
However, one major drawback of their algorithm is that, for huge graphs,
branching still can take exponential time. To avoid this problem, we
recursively choose vertices that are likely to be in a large independent set
(using an evolutionary approach), then further kernelize the graph. We show
that identifying and removing vertices likely to be in large independent sets
opens up the reduction spacewhich not only speeds up the computation of
large independent sets drastically, but also enables us to compute highquality
independent sets on much larger instances than previously reported in the
literature.

Computing maximum independent sets in graphs is an important problem in
computer science. In this paper, we develop an evolutionary algorithm to tackle
the problem. The core innovations of the algorithm are very natural combine
operations based on graph partitioning and local search algorithms. More
precisely, we employ a stateoftheart graph partitioner to derive operations
that enable us to quickly exchange whole blocks of given independent sets. To
enhance newly computed offsprings we combine our operators with a local search
algorithm. Our experimental evaluation indicates that we are able to outperform
stateoftheart algorithms on a variety of instances.