
We prove a general normal approximation theorem for local graph statistics in
the configuration model, together with an explicit bound on the error in the
approximation with respect to the Wasserstein metric. Such statistics take the
form $T := \sum_{v \in V} H_v$, where $V$ is the vertex set, and $H_v$ depends
on a neighbourhood in the graph around $v$ of size at most $\ell$. The error
bound is expressed in terms of $\ell$, $V$, an almost sure bound on $H_v$,
the maximum vertex degree $d_{\max}$ and the variance of $T$. Under suitable
assumptions on the convergence of the empirical degree distributions to a
limiting distribution, we deduce that the size of the giant component in the
configuration model has asymptotically Gaussian fluctuations.

The Aldous gossip process represents the dissemination of information in
geographical space as a process of locally deterministic spread, augmented by
random long range transmissions. Starting from a single initially informed
individual, the proportion of individuals informed follows an almost
deterministic path, but for a random time shift, caused by the stochastic
behaviour in the very early stages of development. In this paper, it is shown
that, even with the extra information available after a substantial development
time, this broad description remains accurate to first order. However, the
precision of the prediction is now much greater, and the random time shift is
shown to have an approximately normal distribution, with mean and variance that
can be computed from the current state of the process.

We derive explicit central moment inequalities for random variables that
admit a Stein coupling, such as exchangeable pairs, sizebias couplings or
local dependence, among others. The bounds are in terms of moments (not
necessarily central) of variables in the Stein coupling, which are typically
positive or local in some sense, and therefore easier to bound. In cases where
the Stein couplings have the kind of behaviour leading to good normal
approximation, the central moments are closely bounded by those of a normal. We
show how the bounds can be used to produce concentration inequalities, and
compare to those existing in related settings. Finally, we illustrate the power
of the theory by bounding the central moments of sums of neighbourhood
statistics in sparse Erd\H{o}sR\'enyi random graphs.

We provide a general result for bounding the difference between point
probabilities of integer supported distributions and the translated Poisson
distribution, a convenient alternative to the discretized normal. We illustrate
our theorem in the context of the Hoeffding combinatorial central limit theorem
with integer valued summands, of the number of isolated vertices in an
Erd\H{o}sR\'enyi random graph, and of the CurieWeiss model of magnetism,
where we provide optimal or near optimal rates of convergence in the local
limit metric. In the Hoeffding example, even the discrete normal approximation
bounds seem to be new. The general result follows from Stein's method, and
requires a new bound on the Stein solution for the Poisson distribution, which
is of general interest.

For integer valued random variables, the translated Poisson distributions
form a flexible family for approximation in total variation, in much the same
way that the normal family is used for approximation in Kolmogorov distance.
Using the SteinChen method, approximation can often be achieved with error
bounds of the same order as those for the CLT. In this paper, an analogous
theory, again based on Stein's method, is developed in the multivariate
context. The approximating family consists of the equilibrium distributions of
a collection of Markov jump processes, whose analogues in one dimension are the
immigrationdeath processes with Poisson distributions as equilibria. The
method is illustrated by providing total variation error bounds for the
approximation of the equilibrium distribution of one Markov jump process by
that of another. In a companion paper, it is shown how to use the method for
discrete normal approximation in ${\mathbb Z}^d$.

The paper applies the theory developed in Part I to the discrete normal
approximation in total variation of random vectors in ${\mathbb Z}^d$. We
illustrate the use of the method for sums of independent integer valued random
vectors, and for random vectors exhibiting an exchangeable pair. We conclude
with an application to random colourings of regular graphs.

We consider the motion of a particle on a Galton Watson tree, when the
probabilities of jumping from a vertex to any one of its neighbours is
determined by a random process. Given the tree, positive weights are assigned
to the edges in such a way that, viewed along any line of descent, they evolve
as a random process. In order to introduce our method for proving transience or
recurrence, we first suppose that the weights are i.i.d., reproving a result of
Lyons and Pemantle. We then extend the argument to allow a Markovian
environment, and finally to a random walk on a Markovian environment that
changes the environment. Our approach involves studying the typical behaviour
of processes on fixed lines of descent, which we then show determines the
behaviour of the process on the whole tree.

The paper presents a phenomenon occurring in population processes that start
near zero and have large carrying capacity. By the classical result of
Kurtz~(1970), such processes, normalized by the carrying capacity, converge on
finite intervals to the solutions of ordinary differential equations, also
known as the fluid limit. When the initial population is small relative to
carrying capacity, this limit is trivial. Here we show that, viewed at suitably
chosen times increasing to infinity, the process converges to the fluid limit,
governed by the same dynamics, but with a random initial condition. This random
initial condition is related to the martingale limit of an associated linear
birth and death process.

Density dependent Markov population processes in large populations of size
$N$ were shown by Kurtz (1970, 1971) to be well approximated over finite time
intervals by the solution of the differential equations that describe their
average drift, and to exhibit stochastic fluctuations about this deterministic
solution on the scale $\sqrt N$ that can be approximated by a diffusion
process. Here, motivated by an example from evolutionary biology, we are
concerned with describing how such a process leaves an absorbing boundary.
Initially, one or more of the populations is of size much smaller than $N$, and
the length of time taken until all populations have sizes comparable to $N$
then becomes infinite as $N \to \infty$. Under suitable assumptions, we show
that in the early stages of development, up to the time when all populations
have sizes at least $N^{1\alpha}$, for $1/3 < \alpha < 1$, the process can be
accurately approximated in total variation by a Markov branching process.
Thereafter, the process is well approximated by the deterministic solution
starting from the original initial point, but with a random time delay.
Analogous behaviour is also established for a Markov process approaching an
equilibrium on a boundary, where one or more of the populations become extinct.

Density dependent Markov population processes with countably many types can
often be well approximated over finite time intervals by the solution of the
differential equations that describe their average drift, provided that the
total population size is large. They also exhibit diffusive stochastic
fluctuations on a smaller scale about this deterministic path. Here, it is
shown that the individuals in such processes experience an almost deterministic
environment. Small groups of individuals behave almost independently of one
another, evolving as Markov jump processes, whose transition rates are
prescribed functions of time. In the context of metapopulation models, we show
that `individuals' can represent either patches or the individuals that migrate
among the patches; in hostparasite systems, they can represent both hosts and
parasites.

Both small world models of random networks with occasional long range
connections and gossip processes with occasional long range transmission of
information have similar characteristic behaviour. The long range elements
appreciably reduce the effective distances, measured in space or in time,
between pairs of typical points. In this paper, we show that their common
behaviour can be interpreted as a product of the locally branching nature of
the models. In particular, it is shown that both typical distances between
points and the proportion of space that can be reached within a given distance
or time can be approximated by formulae involving the limit random variable of
the branching process.

Many models of epidemic spread have a common qualitative structure. The
numbers of infected individuals during the initial stages of an epidemic can be
well approximated by a branching process, after which the proportion of
individuals that are susceptible follows a more or less deterministic course.
In this paper, we show that both of these features are consequences of assuming
a locally branching structure in the models, and that the deterministic course
can itself be determined from the distribution of the limiting random variable
associated with the backward, susceptibility branching process. Examples
considered include a stochastic version of the Kermack & McKendrick model, the
ReedFrost model, and the Volz configuration model.

Quasistationary distributions, as discussed by Darroch & Seneta (1965), have
been used in biology to describe the steady state behaviour of population
models which, while eventually certain to become extinct, nevertheless maintain
an apparent stochastic equilibrium for long periods. These distributions have
some drawbacks: they need not exist, nor be unique, and their calculation can
present problems. In an earlier paper, we gave biologically plausible
conditions under which the quasistationary distribution is unique, and can be
closely approximated by distributions that are simple to compute. In this
paper, we consider conditions under which the quasistationary distribution, if
it exists, need not be unique, but an apparent stochastic equilibrium can
nonetheless be identified and computed; we call such a distribution a
quasiequilibrium distribution.

When approximating the joint distribution of the component counts of a
decomposable combinatorial structure that is `almost' in the logarithmic class,
but nonetheless has irregular structure, it is useful to be able first to
establish that the distribution of a certain sum of nonnegative integer valued
random variables is smooth. This distribution is not like the normal, and
individual summands can contribute a nontrivial amount to the whole, so its
smoothness is somewhat surprising. In this paper, we consider two coupling
approaches to establishing the smoothness, and contrast the results that are
obtained.

Quasilogarithmic combinatorial structures are a class of decomposable
combinatorial structures which extend the logarithmic class considered by
Arratia, Barbour and Tavar\'{e} (2003). In order to obtain asymptotic
approximations to their component spectrum, it is necessary first to establish
an approximation to the sum of an associated sequence of independent random
variables in terms of the Dickman distribution. This in turn requires an
argument that refines the Mineka coupling by incorporating a blocking
construction, leading to exponentially sharper coupling rates for the sums in
question. Applications include distributional limit theorems for the size of
the largest component and for the vector of counts of the small components in a
quasilogarithmic combinatorial structure.

The dynamics of tumour evolution are not well understood. In this paper we
provide a statistical framework for evaluating the molecular variation observed
in different parts of a colorectal tumour. A multisample version of the Ewens
Sampling Formula forms the basis for our modelling of the data, and we provide
a simulation procedure for use in obtaining reference distributions for the
statistics of interest. We also describe the largesample asymptotics of the
joint distributions of the variation observed in different parts of the tumour.
While actual data should be evaluated with reference to the simulation
procedure, the asymptotics serve to provide theoretical guidelines, for
instance with reference to the choice of possible statistics.

Quasistationary distributions, as discussed by Darroch & Seneta (1965), have
been used in biology to describe the steady state behaviour of population
models which, while eventually certain to become extinct, nevertheless maintain
an apparent stochastic equilibrium for long periods. These distributions have
some drawbacks: they need not exist, nor be unique, and their calculation can
present problems. In this paper, we give biologically plausible conditions
under which the quasistationary distribution is unique, and can be closely
approximated by distributions that are simple to compute.

Using an associated branching process as the basis of our approximation, we
show that typical interpoint distances in a multitype random intersection
graph have a defective distribution, which is well described by a mixture of
translated and scaled Gumbel distributions, the missing mass corresponding to
the event that the vertices are not in the same component of the graph.

The paper establishes a functional version of the Hoeffding combinatorial
central limit theorem. First, a prelimiting Gaussian process approximation is
defined, and is shown to be at a distance of the order of the Lyapounov ratio
from the original random process. Distance is measured by comparison of
expectations of smooth functionals of the processes, and the argument is by way
of Stein's method. The prelimiting process is then shown, under weak
conditions, to converge to a Gaussian limit process. The theorem is used to
describe the shape of random permutation tableaux.

The paper concerns the classical occupancy scheme with infinitely many boxes.
We establish approximations to the distributions of the number of occupied
boxes, and of the number of boxes containing exactly r balls, within the family
of translated Poisson distributions. These are shown to be of ideal asymptotic
order, with respect both to total variation distance and to the approximation
of point probabilities. The proof is probabilistic, making use of a translated
Poisson approximation theorem of R\"ollin (2005).

The paper is concerned with the equilibrium distributions of continuoustime
density dependent Markov processes on the integers. These distributions are
known typically to be approximately normal, and the approximation error, as
measured in Kolmogorov distance, is of the smallest order that is compatible
with their having integer support. Here, an approximation in the much stronger
total variation norm is established, without any loss in the asymptotic order
of accuracy; the approximating distribution is a translated Poisson
distribution having the same variance and (almost) the same mean. Our arguments
are based on the SteinChen method and Dynkin's formula.

The paper is concerned with the equilibrium distribution $\Pi_n$ of the
$n$th element in a sequence of continuoustime density dependent Markov
processes on the integers. Under a $(2+\a)$th moment condition on the jump
distributions, we establish a bound of order $O(n^{(\a+1)/2}\sqrt{\log n})$ on
the difference between the point probabilities of $\Pi_n$ and those of a
translated Poisson distribution with the same variance. Except for the factor
$\sqrt{\log n}$, the result is as good as could be obtained in the simpler
setting of sums of independent integervalued random variables. Our arguments
are based on the SteinChen method and coupling.

In modeling parasitic diseases, it is natural to distinguish hosts according
to the number of parasites that they carry, leading to a countably infinite
type space. Proving the analogue of the deterministic equations, used in models
with finitely many types as a "law of large numbers" approximation to the
underlying stochastic model, has previously either been done case by case,
using some special structure, or else not attempted. In this paper we prove a
general theorem of this sort, and complement it with a rate of convergence in
the $\ell_1$norm.

The paper is concerned with approximating the distribution of a sum W of n
integer valued random variables Y_i, whose distributions depend on the state of
an underlying Markov chain X. The approximation is in terms of a translated
Poisson distribution, with mean and variance chosen to be close to those of W,
and the error is measured with respect to the total variation norm. Error
bounds comparable to those found for normal approximation with respect to the
weaker Kolmogorov distance are established, provided that the distribution of
the sum of the Y_i's between the successive visits of X to a reference state is
aperiodic. Without this assumption, approximation in total variation cannot be
expected to be good.

Branching process approximation to the initial stages of an epidemic process
has been used since the 1950's as a technique for providing stochastic
counterparts to deterministic epidemic threshold theorems. One way of
describing the approximation is to construct both branching and epidemic
processes on the same probability space, in such a way that their paths
coincide for as long as possible. In this paper, it is shown, in the context of
a Markovian model of parasitic infection, that coincidence can be achieved with
asymptotically high probability until o(N^{2/3}) infections have occurred,
where N denotes the total number of hosts.