
We study delay of jobs that consist of multiple parallel tasks, which is a
critical performance metric in a wide range of applications such as data file
retrieval in coded storage systems and parallel computing. In this problem,
each job is completed only when all of its tasks are completed, so the delay of
a job is the maximum of the delays of its tasks. Despite the wide attention
this problem has received, tight analysis is still largely unknown since
analyzing job delay requires characterizing the complicated correlation among
task delays, which is hard to do.
We first consider an asymptotic regime where the number of servers, $n$, goes
to infinity, and the number of tasks in a job, $k^{(n)}$, is allowed to
increase with $n$. We establish the asymptotic independence of any $k^{(n)}$
queues under the condition $k^{(n)} = o(n^{1/4})$. This greatly generalizes the
asymptoticindependence type of results in the literature where asymptotic
independence is shown only for a fixed constant number of queues. As a
consequence of our independence result, the job delay converges to the maximum
of independent task delays.
We next consider the nonasymptotic regime. Here we prove that independence
yields a stochastic upper bound on job delay for any $n$ and any $k^{(n)}$ with
$k^{(n)}\le n$. The key component of our proof is a new technique we develop,
called "Poisson oversampling". Our approach converts the job delay problem into
a corresponding ballsandbins problem. However, in contrast with typical
ballsandbins problems where there is a negative correlation among bins, we
prove that our variant exhibits positive correlation.

Many recent caching systems aim to improve miss ratios, but there is no good
sense among practitioners of how much further miss ratios can be improved. In
other words, should the systems community continue working on this problem?
Currently, there is no principled answer to this question. In practice, object
sizes often vary by several orders of magnitude, where computing the optimal
miss ratio (OPT) is known to be NPhard. The few known results on caching with
variable object sizes provide very weak bounds and are impractical to compute
on traces of realistic length. We propose a new method to compute upper and
lower bounds on OPT. Our key insight is to represent caching as a mincost flow
problem, hence we call our method the flowbased offline optimal (FOO). We
prove that, under simple independence assumptions, FOO's bounds become tight as
the number of objects goes to infinity. Indeed, FOO's error over 10M requests
of production CDN and storage traces is negligible: at most 0.3%. FOO thus
reveals, for the first time, the limits of caching with variable object sizes.
While FOO is very accurate, it is computationally impractical on traces with
hundreds of millions of requests. We therefore extend FOO to obtain more
efficient bounds on OPT, which we call practical flowbased offline optimal
(PFOO). We evaluate PFOO on several full production traces and use it to
compare OPT to prior online policies. This analysis shows that current caching
systems are in fact still far from optimal, suffering 1143% more cache misses
than OPT, whereas the best prior offline bounds suggest that there is
essentially no room for improvement.

We consider an extremely broad class of M/G/1 scheduling policies called
SOAP: Schedule Ordered by Agebased Priority. The SOAP policies include almost
all scheduling policies in the literature as well as an infinite number of
variants which have never been analyzed, or maybe not even conceived. SOAP
policies range from classic policies, like firstcome, firstserve (FCFS),
foregroundbackground (FB), classbased priority, and shortest remaining
processing time (SRPT); to much more complicated scheduling rules, such as the
famously complex Gittins index policy and other policies in which a job's
priority changes arbitrarily with its age. While the response time of policies
in the former category is well understood, policies in the latter category have
resisted response time analysis. We present a universal analysis of all SOAP
policies, deriving the mean and LaplaceStieltjes transform of response time.

To keep pace with Moore's law, chip designers have focused on increasing the
number of cores per chip rather than single core performance. In turn, modern
jobs are often designed to run on any number of cores. However, to effectively
leverage these multicore chips, one must address the question of how many
cores to assign to each job. Given that jobs receive sublinear speedups from
additional cores, there is an obvious tradeoff: allocating more cores to an
individual job reduces the job's runtime, but in turn decreases the efficiency
of the overall system. We ask how the system should schedule jobs across cores
so as to minimize the mean response time over a stream of incoming jobs.
To answer this question, we develop an analytical model of jobs running on a
multicore machine. We prove that EQUI, a policy which continuously divides
cores evenly across jobs, is optimal when all jobs follow a single speedup
curve and have exponentially distributed sizes. EQUI requires jobs to change
their level of parallelization while they run. Since this is not possible for
all workloads, we consider a class of "fixedwidth" policies, which choose a
single level of parallelization, k, to use for all jobs. We prove that,
surprisingly, it is possible to achieve EQUI's performance without requiring
jobs to change their levels of parallelization by using the optimal fixed level
of parallelization, k*. We also show how to analytically derive the optimal k*
as a function of the system load, the speedup curve, and the job size
distribution.
In the case where jobs may follow different speedup curves, finding a good
scheduling policy is even more challenging. We find that policies like EQUI
which performed well in the case of a single speedup function now perform
poorly. We propose a very simple policy, GREEDY*, which performs nearoptimally
when compared to the numericallyderived optimal policy.

Many problems in computing, service, and manufacturing systems can be modeled
via infinite repeating Markov chains with an infinite number of levels and a
finite number of phases. Many such chains are quasibirthdeath processes
(QBDs) with transitions that are skipfree in level, in that one can only
transition between consecutive levels, and unidirectional in phase, in that one
can only transition from lowernumbered phases to highernumbered phases. We
present a procedure, which we call Clearing Analysis on Phases (CAP), for
determining the limiting probabilities of such Markov chains exactly. The CAP
method yields the limiting probability of each state in the repeating portion
of the chain as a linear combination of scalar bases raised to a power
corresponding to the level of the state. The weights in these linear
combinations can be determined by solving a finite system of linear equations.