
Due to its accuracy and generality, Monte Carlo radiative transfer (MCRT) has
emerged as the prevalent method for Ly$\alpha$ radiative transfer in arbitrary
geometries. The standard MCRT encounters a significant efficiency barrier in
the high optical depth, diffusion regime. Multiple acceleration schemes have
been developed to improve the efficiency of MCRT but the noise from photon
packet discretization remains a challenge. The discrete diffusion Monte Carlo
(DDMC) scheme has been successfully applied in stateoftheart radiation
hydrodynamics (RHD) simulations. Still, the established framework is not
optimal for resonant line transfer. Inspired by the DDMC paradigm, we present a
novel extension to resonant DDMC (rDDMC) in which diffusion in space and
frequency are treated on equal footing. We explore the robustness of our new
method and demonstrate a level of performance that justifies incorporating the
method into existing Ly$\alpha$ codes. We present computational speedups of
$\sim 10^2$$10^6$ relative to contemporary MCRT implementations with schemes
that skip scattering in the core of the line profile. This is because the rDDMC
runtime scales with the spatial and frequency resolution rather than the number
of scatterings  the latter is typically $\propto \tau_0$ for static media, or
$\propto (a \tau_0)^{2/3}$ with coreskipping. We anticipate new frontiers in
which onthefly Ly$\alpha$ radiative transfer calculations are feasible in 3D
RHD. More generally, rDDMC is transferable to any computationally demanding
problem amenable to a FokkerPlanck approximation of frequency redistribution.

Inorder scalar RISC architectures have been the dominant paradigm in FPGA
soft processor design for twenty years. Prior outoforder superscalar
implementations have not exhibited competitive area or absolute performance.
This paper describes a new way to build fast and areaefficient outoforder
superscalar soft processors by utilizing an Explicit Data Graph Execution
(EDGE) instruction set architecture. By carefully mapping the EDGE
microarchitecture, and in particular, its dataflow instruction scheduler, we
demonstrate the feasibility of an outoforder FPGA architecture. Two scheduler
design alternatives are compared.

Performing Bayesian inference via Markov chain Monte Carlo (MCMC) can be
exceedingly expensive when posterior evaluations invoke the evaluation of a
computationally expensive model, such as a system of partial differential
equations. In recent work [Conrad et al. JASA 2016, arXiv:1402.1694], we
described a framework for constructing and refining local approximations of
such models during an MCMC simulation. These posterioradapted approximations
harness regularity of the model to reduce the computational cost of inference
while preserving asymptotic exactness of the Markov chain. Here we describe two
extensions of that work. First, we prove that samplers running in parallel can
collaboratively construct a shared posterior approximation while ensuring
ergodicity of each associated chain, providing a novel opportunity for
exploiting parallel computation in MCMC. Second, focusing on the
Metropolisadjusted Langevin algorithm, we describe how a proposal
distribution can successfully employ gradients and other relevant information
extracted from the approximation. We investigate the practical performance of
our strategies using two challenging inference problems, the first in
subsurface hydrology and the second in glaciology. Using local approximations
constructed via parallel chains, we successfully reduce the run time needed to
characterize the posterior distributions in these problems from days to hours
and from months to days, respectively, dramatically improving the tractability
of Bayesian inference.

We perform a postprocessing radiative feedback analysis on a 3D ab initio
cosmological simulation of an atomic cooling halo under the direct collapse
black hole (DCBH) scenario. We maintain the spatial resolution of the
simulation by incorporating native raytracing on unstructured mesh data,
including Monte Carlo Lymanalpha (Ly{\alpha}) radiative transfer. DCBHs are
born in gasrich, metalpoor environments with the possibility of Comptonthick
conditions, $N_H \gtrsim 10^{24} {\rm cm}^{2}$. Therefore, the surrounding gas
is capable of experiencing the full impact of the bottledup radiation
pressure. In particular, we find that multiple scattering of Ly{\alpha} photons
provides an important source of mechanical feedback after the gas in the
subparsec region becomes partially ionized, avoiding the bottleneck of
destruction via the twophoton emission mechanism. We provide detailed
discussion of the simulation environment, expansion of the ionization front,
emission and escape of Ly{\alpha} radiation, and Compton scattering. A sink
particle prescription allows us to extract approximate limits on the
postformation evolution of the radiative feedback. Fully coupled Ly{\alpha}
radiation hydrodynamics will be crucial to consider in future DCBH simulations.

We obtain several quantitative bounds on the mixing properties of the
Hamiltonian Monte Carlo (HMC) algorithm for a strongly logconcave target
distribution $\pi$ on $\mathbb{R}^{d}$, showing that HMC mixes quickly in this
setting. One of our main results is a dimensionfree bound on the mixing of an
"ideal" HMC chain, which is used to show that the usual leapfrog implementation
of HMC can sample from $\pi$ using only $\mathcal{O}(d^{\frac{1}{4}})$ gradient
evaluations. This dependence on dimension is sharp, and our results
significantly extend and improve previous quantitative bounds on the mixing of
HMC.

We study a kinetically constrained Ising process (KCIP) associated with a
graph $G$ and density parameter $p$; this process is an interacting particle
system with state space $\{ 0, 1 \}^{G}$, the location of the particles. The
`constraint' in the name of the process refers to the rule that a vertex cannot
change its state unless it has at least one neighbour in state `1'. The KCIP
has been proposed by statistical physicists as a model for the glass
transition. In this note, we study the mixing time of a KCIP on the
2dimensional torus $G = \mathbb{Z}_{L}^{2}$ in the lowdensity regime $p =
\frac{c}{L^{2}}$ for arbitrary $0 < c < \infty$, extending our previous results
for the analogous process on the torus $\mathbb{Z}_{L}^{d}$ in dimension $d
\geq 3$. Our general approach is similar, but the extension requires more
delicate bounds on the behaviour of the process at intermediate densities.

We briefly review the historical development of the ideas regarding the first
supermassive black hole seeds, the physics of their formation and radiative
feedback, recent theoretical and observational progress, and our outlook for
the future.

Many modern applications collect highly imbalanced categorical data, with
some categories relatively rare. Bayesian hierarchical models combat data
sparsity by borrowing information, while also quantifying uncertainty. However,
posterior computation presents a fundamental barrier to routine use; a single
class of algorithms does not work well in all settings and practitioners waste
time trying different types of MCMC approaches. This article was motivated by
an application to quantitative advertising in which we encountered extremely
poor computational performance for common data augmentation MCMC algorithms but
obtained excellent performance for adaptive Metropolis. To obtain a deeper
understanding of this behavior, we give strong theory results on computational
complexity in an infinitely imbalanced asymptotic regime. Our results show
computational complexity of Metropolis is logarithmic in sample size, while
data augmentation is polynomial in sample size. The root cause of poor
performance of data augmentation is a discrepancy between the rates at which
the target density and MCMC step sizes concentrate. In general, MCMC algorithms
that have a similar discrepancy will fail in large samples  a result with
substantial practical impact.

It is often possible to speed up the mixing of a Markov chain $\{ X_{t} \}_{t
\in \mathbb{N}}$ on a state space $\Omega$ by \textit{lifting}, that is,
running a more efficient Markov chain $\{ \hat{X}_{t} \}_{t \in \mathbb{N}}$ on
a larger state space $\hat{\Omega} \supset \Omega$ that projects to $\{ X_{t}
\}_{t \in \mathbb{N}}$ in a certain sense. In [CLP99], Chen, Lov{\'a}sz and Pak
prove that for Markov chains on finite state spaces, the mixing time of any
lift of a Markov chain is at least the square root of the mixing time of the
original chain, up to a factor that depends on the stationary measure.
Unfortunately, this extra factor makes the bound in [CLP99] very loose for
Markov chains on large state spaces and useless for Markov chains on continuous
state spaces. In this paper, we develop an extension of the evolving set method
that allows us to refine this extra factor and find bounds for Markov chains on
continuous state spaces that are analogous to the bounds in [CLP99]. These
bounds also allow us to improve on the bounds in [CLP99] for some chains on
finite state spaces.

The dynamical impact of Lymanalpha (Ly{\alpha}) radiation pressure on galaxy
formation depends on the rate and duration of momentum transfer between
Ly{\alpha} photons and neutral hydrogen gas. Although photon trapping has the
potential to multiply the effective force, ionizing radiation from stellar
sources may relieve the Ly{\alpha} pressure before appreciably affecting the
kinematics of the host galaxy or efficiently coupling Ly{\alpha} photons to the
outflow. We present selfconsistent Ly{\alpha} radiationhydrodynamics
simulations of high$z$ galaxy environments by coupling the Cosmic Ly{\alpha}
Transfer code (COLT) with spherically symmetric Lagrangian frame hydrodynamics.
The accurate but computationally expensive MonteCarlo radiative transfer
calculations are feasible under the onedimensional approximation. The initial
starburst drives an expanding shell of gas from the centre and in certain cases
Ly{\alpha} feedback significantly enhances the shell velocity. Radiative
feedback alone is capable of ejecting baryons into the intergalactic medium
(IGM) for protogalaxies with a virial mass of $M_{\rm vir} \lesssim 10^8~{\rm
M}_\odot$. We compare the Ly{\alpha} signatures of Population III stars with
$10^5$ K blackbody emission to that of direct collapse black holes with a
nonthermal Comptonthick spectrum and find substantial differences if the
Ly{\alpha} spectra are shaped by gas pushed by Ly{\alpha} radiationdriven
winds. For both sources, the flux emerging from the galaxy is reprocessed by
the IGM such that the observed Ly{\alpha} luminosity is reduced significantly
and the timeaveraged velocity offset of the Ly{\alpha} peak is shifted
redward.

We introduce a Markov chain for sampling from the uniform distribution on a
Riemannian manifold $\mathcal{M}$, which we call the $\textit{geodesic walk}$.
We prove that the mixing time of this walk on any manifold with positive
sectional curvature $C_{x}(u,v)$ bounded both above and below by $0 <
\mathfrak{m}_{2} \leq C_{x}(u,v) \leq \mathfrak{M}_2 < \infty$ is
$\mathcal{O}^*\left(\frac{\mathfrak{M}_2}{\mathfrak{m}_2}\right)$. In
particular, this bound on the mixing time does not depend explicitly on the
dimension of the manifold. In the special case that $\mathcal{M}$ is the
boundary of a convex body, we give an explicit and computationally tractable
algorithm for approximating the exact geodesic walk. As a consequence, we
obtain an algorithm for sampling uniformly from the surface of a convex body
that has running time bounded solely in terms of the curvature of the body.

As it has become common to use many computer cores in routine applications,
finding good ways to parallelize popular algorithms has become increasingly
important. In this paper, we present a parallelization scheme for Markov chain
Monte Carlo (MCMC) methods based on spectral clustering of the underlying state
space, generalizing earlier work on parallelization of MCMC methods by state
space partitioning. We show empirically that this approach speeds up MCMC
sampling for multimodal distributions and that it can be usefully applied in
greater generality than several related algorithms. Our algorithm converges
under reasonable conditions to an `optimal' MCMC algorithm. We also show that
our approach can be asymptotically far more efficient than naive
parallelization, even in situations such as completely flat target
distributions where no unique optimal algorithm exists. Finally, we combine
theoretical and empirical bounds to provide practical guidance on the choice of
tuning parameters.

Determining the total variation mixing time of Kac's random walk on the
special orthogonal group $\mathrm{SO}(n)$ has been a longstanding open
problem. In this paper, we construct a novel nonMarkovian coupling for
bounding this mixing time. The analysis of our coupling entails controlling the
smallest singular value of a certain random matrix with highly dependent
entries. The dependence of the entries in our matrix makes it notamenable to
existing techniques in random matrix theory. To circumvent this difficulty, we
extend some recent bounds on the smallest singular values of matrices with
independent entries to our setting. These bounds imply that the mixing time of
Kac's walk on the group $\mathrm{SO}(n)$ is between $C_{1} n^{2}$ and $C_{2}
n^{4} \log(n)$ for some explicit constants $0 < C_{1}, C_{2} < \infty$,
substantially improving on the bound of $O(n^{5} \log(n)^{2})$ by Jiang. Our
methods may also be applied to other high dimensional Gibbs samplers with
constraints and thus are of independent interest. In addition to giving
analytical bounds on the mixing time, our approach allows us to compute
rigorous estimates of the mixing time by simulating the eigenvalues of a random
matrix.

Throughout the epoch of reionization the most luminous Ly{\alpha} emitters
are capable of ionizing their own local bubbles. The CR7 galaxy at $z \approx
6.6$ stands out for its combination of exceptionally bright Ly{\alpha} and HeII
1640 Angstrom line emission but absence of metal lines. As a result CR7 may be
the first viable candidate host of a young primordial starburst or direct
collapse black hole. Highresolution spectroscopy reveals a +160 km s$^{1}$
velocity offset between the Ly{\alpha} and HeII line peaks while the spatial
extent of the Ly{\alpha} emitting region is $\sim 16$ kpc. The observables are
indicative of an outflow signature produced by a strong central source. We
present onedimensional radiationhydrodynamics simulations incorporating
accurate Ly{\alpha} feedback and ionizing radiation to investigate the nature
of the CR7 source. We find that a Population III star cluster with $10^5$ K
blackbody emission ionizes its environment too efficiently to generate a
significant velocity offset. However, a massive black hole with a nonthermal
Comptonthick spectrum is able to reproduce the Ly{\alpha} signatures as a
result of higher photon trapping and longer potential lifetime. For both
sources, Ly{\alpha} radiation pressure turns out to be dynamically important.

Determining the mixing time of Kac's random walk on the sphere
$\mathrm{S}^{n1}$ is a longstanding open problem. We show that the total
variation mixing time of Kac's walk on $\mathrm{S}^{n1}$ is between
$\frac{1}{2} \, n \log(n)$ and $200 \,n \log(n)$. Our bound is thus optimal up
to a constant factor, improving on the bestknown upper bound of $O(n^{5}
\log(n)^{2})$ due to Jiang. Our main tool is a `nonMarkovian' coupling
recently introduced by the second author for obtaining the convergence rates of
certain high dimensional Gibbs samplers in continuous state spaces.

We analyze the computational efficiency of approximate Bayesian computation
(ABC), which approximates a likelihood function by drawing pseudosamples from
the associated model. For the rejection sampling version of ABC, it is known
that multiple pseudosamples cannot substantially increase (and can
substantially decrease) the efficiency of the algorithm as compared to
employing a highvariance estimate based on a single pseudosample. We show
that this conclusion also holds for a Markov chain Monte Carlo version of ABC,
implying that it is unnecessary to tune the number of pseudosamples used in
ABCMCMC. This conclusion is in contrast to particle MCMC methods, for which
increasing the number of particles can provide large gains in computational
efficiency.

Many finitestate reversible Markov chains can be naturally decomposed into
"projection" and "restriction" chains. In this paper we provide bounds on the
total variation mixing times of the original chain in terms of the mixing
properties of these related chains. This paper is in the tradition of existing
bounds on Poincare and logSobolev constants of Markov chains in terms of
similar decompositions [JSTV04, MR02, MR06, MY09]. Our proofs are simple,
relying largely on recent results relating hitting and mixing times of
reversible Markov chains [PS13, Oli12]. We describe situations in which our
results give substantially better bounds than those obtained by applying
existing decomposition results and provide examples for illustration.

We construct a new framework for accelerating Markov chain Monte Carlo in
posterior sampling problems where standard methods are limited by the
computational cost of the likelihood, or of numerical models embedded therein.
Our approach introduces local approximations of these models into the
MetropolisHastings kernel, borrowing ideas from deterministic approximation
theory, optimization, and experimental design. Previous efforts at integrating
approximate models into inference typically sacrifice either the sampler's
exactness or efficiency; our work seeks to address these limitations by
exploiting useful convergence characteristics of local approximations. We prove
the ergodicity of our approximate Markov chain, showing that it samples
asymptotically from the \emph{exact} posterior distribution of interest. We
describe variations of the algorithm that employ either local polynomial
approximations or local Gaussian process regressors. Our theoretical results
reinforce the key observation underlying this paper: when the likelihood has
some \emph{local} regularity, the number of model evaluations per MCMC step can
be greatly reduced without biasing the Monte Carlo average. Numerical
experiments demonstrate multiple orderofmagnitude reductions in the number of
forward model evaluations used in representative ODE and PDE inference
problems, with both synthetic and real data.

In many modern applications, difficulty in evaluating the posterior density
makes performing even a single MCMC step slow. This difficulty can be caused by
intractable likelihood functions, but also appears for routine problems with
large data sets. Many researchers have responded by running approximate
versions of MCMC algorithms. In this note, we develop quantitative bounds for
showing the ergodicity of these approximate samplers. We then use these bounds
to study the biasvariance tradeoff of approximate MCMC algorithms. We apply
our results to simple versions of recently proposed algorithms, including a
variant of the "austerity" framework of Korratikara et al.

We present the Cosmic Lyman$\alpha$ Transfer code (COLT), a massively
parallel MonteCarlo radiative transfer code, to simulate Lyman$\alpha$
(Ly$\alpha$) resonant scattering through neutral hydrogen as a probe of the
first galaxies. We explore the interaction of centrally produced Ly$\alpha$
radiation with the host galactic environment. Ly$\alpha$ photons emitted from
the luminous starburst region escape with characteristic features in the line
profile depending on the density distribution, ionization structure, and bulk
velocity fields. For example, anisotropic ionization exhibits a tall peak close
to line centre with a skewed tail that drops off gradually. Idealized models of
first galaxies explore the effect of mass, anisotropic H II regions, and
radiation pressure driven winds on Ly$\alpha$ observables. We employ mesh
refinement to resolve critical structures. We also postprocess an ab initio
cosmological simulation and examine images captured at various escape distances
within the 1 Mpc$^3$ comoving volume. Finally, we discuss the emergent spectra
and surface brightness profiles of these objects in the context of high$z$
observations. The first galaxies will likely be observed through the red
damping wing of the Ly$\alpha$ line. Observations will be biased toward
galaxies with an intrinsic red peak located far from line centre that reside in
extensive H II super bubbles, which allows Hubble flow to sufficiently redshift
photons away from line centre and facilitate transmission through the
intergalactic medium (IGM). Even with gravitational lensing to boost the
luminosity this preliminary work indicates that Ly$\alpha$ emission from
stellar clusters within haloes of $M_{\rm vir}<10^9~{\rm M}_\odot$ is generally
too faint to be detected by the James Webb Space Telescope (JWST).

We study a kinetically constrained Ising process (KCIP) associated with a
graph G and density parameter p; this process is an interacting particle system
with state space $\{0,1\}^{G}$. The stationary distribution of the KCIP Markov
chain is the Binomial($G, p$) distribution on the number of particles,
conditioned on having at least one particle. The `constraint' in the name of
the process refers to the rule that a vertex cannot change its state unless it
has at least one neighbour in state `1'. The KCIP has been proposed by
statistical physicists as a model for the glass transition, and more recently
as a simple algorithm for data storage in computer networks. In this note, we
study the mixing time of this process on the torus $G = \mathbb{Z}_{L}^{d}$, $d
\geq 3$, in the lowdensity regime $p = \frac{c}{n}$ for arbitrary $0 < c < 1$;
this regime is the subject of a conjecture of Aldous and is natural in the
context of computer networks. Our results provide a counterexample to Aldous'
conjecture, suggest a natural modifcation of the conjecture, and show that this
modifcation is correct up to logarithmic factors. The methods developed in this
paper also provide a strategy for tackling Aldous' conjecture for other graphs.

Adaptive Markov chains are an important class of Monte Carlo methods for
sampling from probability distributions. The time evolution of adaptive
algorithms depends on past samples, and thus these algorithms are
nonMarkovian. Although there has been previous work establishing conditions
for their ergodicity, not much is known theoretically about their finite sample
properties. In this paper, using a notion of discrete Ricci curvature for
Markov kernels introduced by Ollivier, we establish concentration inequalities
and finite sample bounds for a class of adaptive Markov chains. After
establishing some general results, we give quantitative bounds for
`multilevel' adaptive algorithms such as the equienergy sampler. We also
provide the first rigorous proofs that the finite sample properties of an
equienergy sampler are superior to those of related parallel tempering and
MetropolisHastings samplers after a learning period comparable to their mixing
times.

We determine the mixing time of a simple Gibbs sampler on the unit simplex,
confirming a conjecture of Aldous. The upper bound is based on a twostep
coupling, where the first step is a simple contraction argument and the second
step is a nonMarkovian coupling. We also present a MCMCbased perfect sampling
algorithm based on our proof which can be applied with Gibbs samplers that are
harder to analyze.

Let $X_{t}$ and $Y_{t}$ be two Markov chains, on state spaces $\Omega \subset
\hat{\Omega}$. In this paper, we discuss how to prove bounds on the spectrum of
$X_{t}$ based on bounds on the spectrum of $Y_{t}$. This generalizes work of
Diaconis, SaloffCoste, Yuen and others on comparison of chains in the case
$\Omega = \hat{\Omega}$. The main tool is the extension of functions from the
smaller space to the larger, which allows comparison of the entire spectrum of
the two chains. The theory is used to give quick analyses of several chains
without symmetry. The main application is to a `random transposition' walk on
derangements.

We use a nonMarkovian coupling and small modifications of techniques from
the theory of finite Markov chains to analyze some Markov chains on continuous
state spaces. The first is a Gibbs sampler on narrow contingency tables, the
second a gen eralization of a sampler introduced by Randall and Winkler.