
A surprising result of FitzGerald and Horn (1977) shows that $A^{\circ
\alpha} := (a_{ij}^\alpha)$ is positive semidefinite (p.s.d.) for every
entrywise nonnegative $n \times n$ p.s.d. matrix $A = (a_{ij})$ if and only if
$\alpha$ is a positive integer or $\alpha \geq n2$. Given a graph $G$, we
consider the refined problem of characterizing the set $\mathcal{H}_G$ of
entrywise powers preserving positivity for matrices with a zero pattern encoded
by $G$. Using algebraic and combinatorial methods, we study how the geometry of
$G$ influences the set $\mathcal{H}_G$. Our treatment provides new and exciting
connections between combinatorics and analysis, and leads us to introduce and
compute a new graph invariant called the critical exponent.

Simple random walks are a basic staple of the foundation of probability
theory and form the building block of many useful and complex stochastic
processes. In this paper we study a natural generalization of the random walk
to a process in which the allowed step sizes take values in the set
$\{\pm1,\pm2,\ldots,\pm k\}$, a process we call a random leap. The need to
analyze such models arises naturally in modernday data science and socalled
"big data" applications. We provide closedform expressions for quantities
associated with first passage times and absorption events of random leaps.
These expressions are formulated in terms of the roots of the characteristic
polynomial of a certain recurrence relation associated with the transition
probabilities. Our analysis shows that the expressions for absorption
probabilities for the classical simple random walk are a special case of a
universal result that is very elegant. We also consider an important variant of
a random leap: the reflecting random leap. We demonstrate that the reflecting
random leap exhibits more interesting behavior in regard to the existence of a
stationary distribution and properties thereof. Questions relating to
recurrence/transience are also addressed, as well as an application of the
random leap.

Entrywise functions preserving the cone of positive semidefinite matrices
have been studied by many authors, most notably by Schoenberg [Duke Math. J. 9,
1942] and Rudin [Duke Math. J. 26, 1959]. Following their work, it is
wellknown that entrywise functions preserving Loewner positivity in all
dimensions are precisely the absolutely monotonic functions. However, there are
strong theoretical and practical motivations to study functions preserving
positivity in a fixed dimension $n$. Such characterizations for a fixed value
of $n$ are difficult to obtain, and in fact are only known in the $2 \times 2$
case. In this paper, using a novel and intuitive approach, we study entrywise
functions preserving positivity on distinguished submanifolds inside the cone
obtained by imposing rank constraints. These rank constraints are prevalent in
applications, and provide a natural way to relax the elusive original problem
of preserving positivity in a fixed dimension. In our main result, we
characterize entrywise functions mapping $n \times n$ positive semidefinite
matrices of rank at most $l$ into positive semidefinite matrices of rank at
most $k$ for $1 \leq l \leq n$ and $1 \leq k < n$. We also demonstrate how an
important necessary condition for preserving positivity by Horn and Loewner
[Trans. Amer. Math. Soc. 136, 1969] can be significantly generalized by adding
rank constraints. Finally, our techniques allow us to obtain an elementary
proof of the classical characterization of functions preserving positivity in
all dimensions obtained by Schoenberg and Rudin.

Bayesian shrinkage methods have generated a lot of recent interest as tools
for highdimensional regression and model selection. These methods naturally
facilitate tractable uncertainty quantification and incorporation of prior
information. This benefit has led to extensive use of the Bayesian shrinkage
methods across diverse applications. A common feature of these models is that
the corresponding priors on the regression coefficients can be expressed as
scale mixture of normals. While the threestep Gibbs sampler used to sample
from the often intractable associated posterior density has been shown to be
geometrically ergodic for several of these models, it has been demonstrated
recently that convergence of this sampler can still be quite slow in modern
highdimensional settings despite this apparent theoretical safeguard. We
propose a new method to draw from the same posterior via a tractable twostep
blocked Gibbs sampler. We demonstrate that our proposed twostep blocked
sampler exhibits vastly superior convergence behavior compared to the original
threestep sampler in highdimensional regimes on both real and simulated data.
We also provide a detailed theoretical underpinning to the new method in the
context of the Bayesian lasso. First, we prove that the proposed twostep
sampler is geometrically ergodic, and derive explicit upper bounds for the
(geometric) rate of convergence. Furthermore, we demonstrate theoretically that
while the original Bayesian lasso chain is not HilbertSchmidt, the proposed
chain is trace class (and hence HilbertSchmidt). The trace class property
implies that the corresponding Markov operator is compact, and its (countably
many) eigenvalues are summable. It also facilitates a rigorous comparison of
the twostep blocked chain with "sandwich" algorithms which aim to improve
performance of the twostep chain by inserting an inexpensive extra step.

Bayesian shrinkage methods have generated a lot of recent interest as tools
for highdimensional regression and model selection. These methods naturally
facilitate tractable uncertainty quantification and incorporation of prior
information. A common feature of these models, including the Bayesian lasso,
globallocal shrinkage priors, and spikeandslab priors is that the
corresponding priors on the regression coefficients can be expressed as scale
mixture of normals. While the threestep Gibbs sampler used to sample from the
often intractable associated posterior density has been shown to be
geometrically ergodic for several of these models (Khare and Hobert, 2013; Pal
and Khare, 2014), it has been demonstrated recently that convergence of this
sampler can still be quite slow in modern highdimensional settings despite
this apparent theoretical safeguard. We propose a new method to draw from the
same posterior via a tractable twostep blocked Gibbs sampler. We demonstrate
that our proposed twostep blocked sampler exhibits vastly superior convergence
behavior compared to the original three step sampler in highdimensional
regimes on both real and simulated data. We also provide a detailed theoretical
underpinning to the new method in the context of the Bayesian lasso. First, we
derive explicit upper bounds for the (geometric) rate of convergence.
Furthermore, we demonstrate theoretically that while the original Bayesian
lasso chain is not HilbertSchmidt, the proposed chain is trace class (and
hence HilbertSchmidt). The trace class property has useful theoretical and
practical implications. It implies that the corresponding Markov operator is
compact, and its eigenvalues are summable. It also facilitates a rigorous
comparison of the twostep blocked chain with "sandwich" algorithms which aim
to improve performance of the twostep chain by inserting an inexpensive extra
step.

We introduce PseudoNet, a new pseudolikelihoodbased estimator of the inverse
covariance matrix, that has a number of useful statistical and computational
properties. We show, through detailed experiments with synthetic and also
realworld finance as well as wind power data, that PseudoNet outperforms
related methods in terms of estimation error and support recovery, making it
wellsuited for use in a downstream application, where obtaining low estimation
error can be important. We also show, under regularity conditions, that
PseudoNet is consistent. Our proof assumes the existence of accurate estimates
of the diagonal entries of the underlying inverse covariance matrix; we
additionally provide a twostep method to obtain these estimates, even in a
highdimensional setting, going beyond the proofs for related methods. Unlike
other pseudolikelihoodbased methods, we also show that PseudoNet does not
saturate, i.e., in high dimensions, there is no hard limit on the number of
nonzero entries in the PseudoNet estimate. We present a fast algorithm as well
as screening rules that make computing the PseudoNet estimate over a range of
tuning parameters tractable.

We prove a refinement of the inequality by HoffmannJorgensen that is
significant for three reasons. First, our result improves on the
stateoftheart even for realvalued random variables. Second, the result
unifies several versions in the Banach space literature, including those by
Johnson and Schechtman [Ann. Prob. 17], Klass and Nowicki [Ann. Prob. 28], and
Hitczenko and MontgomerySmith [Ann. Prob. 29]. Finally, we show that the
HoffmannJorgensen inequality (including our generalized version) holds not
only in Banach spaces but more generally, in the minimal mathematical framework
required to state the inequality: a metric semigroup $\mathscr{G}$. This
includes normed linear spaces as well as all compact, discrete, or abelian Lie
groups.

We extend the KhinchinKahane inequality to an arbitrary abelian metric group
$\mathscr{G}$. In the special case where $\mathscr{G}$ is normed, we prove a
refinement which is sharp and which extends the sharp version for Banach
spaces. We also provide an alternate proof for normed metric groups as a
consequence of a general "transfer principle". This transfer principle has
immediate applications to stochastic inequalities for $\mathscr{G}$valued
random variables. We also show how to use it to define the expectation of
random variables with values in arbitrary abelian normed metric semigroups.

Covariance estimation for highdimensional datasets is a fundamental problem
in modern day statistics with numerous applications. In these high dimensional
datasets, the number of variables p is typically larger than the sample size n.
A popular way of tackling this challenge is to induce sparsity in the
covariance matrix, its inverse or a relevant transformation. In particular,
methods inducing sparsity in the Cholesky pa rameter of the inverse covariance
matrix can be useful as they are guaranteed to give a positive definite
estimate of the covariance matrix. Also, the estimated sparsity pattern
corresponds to a Directed Acyclic Graph (DAG) model for Gaussian data. In
recent years, two useful penalized likelihood methods for sparse estimation of
this Cholesky parameter (with no restrictions on the sparsity pattern) have
been developed. How ever, these methods either consider a nonconvex
optimization problem which can lead to convergence issues and singular
estimates of the covariance matrix when p > n, or achieve a convex formulation
by placing a strict constraint on the conditional variance parameters. In this
paper, we propose a new penalized likelihood method for sparse estimation of
the inverse covariance Cholesky parameter that aims to overcome some of the
shortcomings of current methods, but retains their respective strengths. We ob
tain a jointly convex formulation for our objective function, which leads to
convergence guarantees, even when p > n. The approach always leads to a
positive definite and symmetric estimator of the covariance matrix. We
establish highdimensional estima tion and graph selection consistency, and
also demonstrate finite sample performance on simulated/real data.

We study probability inequalities leading to tail estimates in a general
semigroup $\mathscr{G}$ with a translationinvariant metric $d_{\mathscr{G}}$.
Using recent work [Ann. Prob., to appear] that extends the HoffmannJorgensen
inequality to all metric semigroups, we obtain tail estimates and approximate
bounds for sums of independent semigroupvalued random variables, their
moments, and decreasing rearrangements. In particular, we obtain the "correct"
universal constants in several cases, extending results in the Banach space
literature by JohnsonSchechtmanZinn [Ann. Prob. 13], Hitczenko [Ann. Prob.
22], and Hitczenko and MontgomerySmith [Ann. Prob. 29]. Our results also hold
more generally, in the minimal mathematical framework required to state them:
metric semigroups $\mathscr{G}$. This includes all compact, discrete, or
abelian Lie groups.

This paper proposes a general adaptive procedure for budgetlimited predictor
design in high dimensions called twostage Sampling, Prediction and Adaptive
Regression via Correlation Screening (SPARCS). SPARCS can be applied to high
dimensional prediction problems in experimental science, medicine, finance, and
engineering, as illustrated by the following. Suppose one wishes to run a
sequence of experiments to learn a sparse multivariate predictor of a dependent
variable $Y$ (disease prognosis for instance) based on a $p$ dimensional set of
independent variables $\mathbf X=[X_1,\ldots, X_p]^T$ (assayed biomarkers).
Assume that the cost of acquiring the full set of variables $\mathbf X$
increases linearly in its dimension. SPARCS breaks the data collection into two
stages in order to achieve an optimal tradeoff between sampling cost and
predictor performance. In the first stage we collect a few ($n$) expensive
samples $\{y_i,\mathbf x_i\}_{i=1}^n$, at the full dimension $p\gg n$ of
$\mathbf X$, winnowing the number of variables down to a smaller dimension $l <
p$ using a type of crosscorrelation or regression coefficient screening. In
the second stage we collect a larger number $(tn)$ of cheaper samples of the
$l$ variables that passed the screening of the first stage. At the second
stage, a low dimensional predictor is constructed by solving the standard
regression problem using all $t$ samples of the selected variables. SPARCS is
an adaptive online algorithm that implements false positive control on the
selected variables, is well suited to small sample sizes, and is scalable to
high dimensions. We establish asymptotic bounds for the Familywise Error Rate
(FWER), specify high dimensional convergence rates for support recovery, and
establish optimal sample allocation rules to the first and second stages.

In this paper, we exploit the theory of dense graph limits to provide a new
framework to study the stability of graph partitioning methods, which we call
structural consistency. Both stability under perturbation as well as asymptotic
consistency (i.e., convergence with probability $1$ as the sample size goes to
infinity under a fixed probability model) follow from our notion of structural
consistency. By formulating structural consistency as a continuity result on
the graphon space, we obtain robust results that are completely independent of
the data generating mechanism. In particular, our results apply in settings
where observations are not independent, thereby significantly generalizing the
common probabilistic approach where data are assumed to be i.i.d.
In order to make precise the notion of structural consistency of graph
partitioning, we begin by extending the theory of graph limits to include
vertex colored graphons. We then define continuous nodelevel statistics and
prove that graph partitioning based on such statistics is consistent. Finally,
we derive the structural consistency of commonly used clustering algorithms in
a general modelfree setting. These include clustering based on local graph
statistics such as homomorphism densities, as well as the popular spectral
clustering using the normalized Laplacian.
We posit that proving the continuity of clustering algorithms in the graph
limit topology can stand on its own as a more robust form of modelfree
consistency. We also believe that the mathematical framework developed in this
paper goes beyond the study of clustering algorithms, and will guide the
development of similar modelfree frameworks to analyze other procedures in the
broader mathematical sciences.

High dimensional covariance estimation and graphical models is a contemporary
topic in statistics and machine learning having widespread applications. An
important line of research in this regard is to shrink the extreme spectrum of
the covariance matrix estimators. A separate line of research in the literature
has considered sparse inverse covariance estimation which in turn gives rise to
graphical models. In practice, however, a sparse covariance or inverse
covariance matrix which is simultaneously wellconditioned and at the same time
computationally tractable is desired. There has been little research at the
confluence of these three topics. In this paper we consider imposing a
condition number constraint to various types of losses used in covariance and
inverse covariance matrix estimation. When the loss function can be decomposed
as a sum of an orthogonally invariant function of the estimate and its inner
product with a function of the sample covariance matrix, we show that a
solution path algorithm can be derived, involving a series of ordinary
differential equations. The path algorithm is attractive because it provides
the entire family of estimates for all possible values of the condition number
bound, at the same computational cost of a single estimate with a fixed upper
bound. An important finding is that the proximal operator for the condition
number constraint, which turns out to be very useful in regularizing loss
functions that are not orthogonally invariant and may yield
nonpositivedefinite estimates, can be efficiently computed by this path
algorithm. As a concrete illustration of its practical importance, we develop
an operatorsplitting algorithm that imposes a guarantee of wellconditioning
as well as positive definiteness to recently proposed convex pseudolikelihood
based graphical model selection methods.

Functions preserving Loewner positivity when applied entrywise to positive
semidefinite matrices have been widely studied in the literature. Following the
work of Schoenberg [Duke Math. J. 9], Rudin [Duke Math. J. 26], and others, it
is wellknown that functions preserving positivity for matrices of all
dimensions are absolutely monotonic (i.e., analytic with nonnegative Taylor
coefficients). In this paper, we study functions preserving positivity when
applied entrywise to sparse matrices, with zeros encoded by a graph $G$ or a
family of graphs $G_n$. Our results generalize Schoenberg and Rudin's results
to a modern setting, where functions are often applied entrywise to sparse
matrices in order to improve their properties (e.g. better conditioning). The
only such result known in the literature is for the complete graph $K_2$. We
provide the first such characterization result for a large family of
noncomplete graphs. Specifically, we characterize functions preserving Loewner
positivity on matrices with zeros according to a tree. These functions are
multiplicatively midpointconvex and superadditive. Leveraging the underlying
sparsity in matrices thus admits the use of functions which are not necessarily
analytic nor absolutely monotonic. We further show that analytic functions
preserving positivity on matrices with zeros according to trees can contain
arbitrarily long sequences of negative coefficients, thus obviating the need
for absolute monotonicity in a very strong sense. This result leads to the
question of exactly when absolute monotonicity is necessary when preserving
positivity for an arbitrary class of graphs. We then provide a stronger
condition in terms of the numerical range of all symmetric matrices, such that
functions satisfying this condition on matrices with zeros according to any
family of graphs with unbounded degrees are necessarily absolutely monotonic.

The study of entrywise powers of matrices was originated by Loewner in the
pursuit of the Bieberbach conjecture. Since the work of FitzGerald and Horn
(1977), it is known that $A^{\circ \alpha} := (a_{ij}^\alpha)$ is positive
semidefinite for every entrywise nonnegative $n \times n$ positive semidefinite
matrix $A = (a_{ij})$ if and only if $\alpha$ is a positive integer or $\alpha
\geq n2$. This surprising result naturally extends the Schur product theorem,
and demonstrates the existence of a sharp phase transition in preserving
positivity. In this paper, we study when entrywise powers preserve positivity
for matrices with structure of zeros encoded by graphs. To each graph is
associated an invariant called its "critical exponent", beyond which every
power preserves positivity. In our main result, we determine the critical
exponents of all chordal/decomposable graphs, and relate them to the geometry
of the underlying graphs. We then examine the critical exponent of important
families of nonchordal graphs such as cycles and bipartite graphs.
Surprisingly, large families of dense graphs have small critical exponents that
do not depend on the number of vertices of the graphs.

Concurrent time series commonly arise in various applications, including when
monitoring the environment such as in air quality measurement networks, weather
stations, oceanographic buoys, or in paleo form such as lake sediments, tree
rings, ice cores, or coral isotopes, with each monitoring or sampling site
providing one of the time series. The goal in such applications is to extract a
common time trend or signal in the observed data. Other examples where the goal
is to extract a common time trend for multiple time series are in stock price
time series, neurological time series, and quality control time series. For
this purpose we develop properties of MAF [Maximum Autocorrelation Factors]
that linearly combines time series in order to maximize the resulting SNR
[signaltonoiseratio] where there are multiple smooth signals present in the
data. Equivalence is established in a regression setting between MAF and CCA
[Canonical Correlation Analysis] even though MAF does not require specific
signal knowledge as opposed to CCA. We proceed to derive the theoretical
properties of MAF and quantify the SNR advantages of MAF in comparison with PCA
[Principal Components Analysis], a commonly used method for linearly combining
time series, and compare their statistical sample properties. MAF and PCA are
then applied to real and simulated data sets to illustrate MAFs efficacy.

Markov chain Monte Carlo (MCMC) lies at the core of modern Bayesian
methodology, much of which would be impossible without it. Thus, the
convergence properties of MCMCs have received significant attention, and in
particular, proving (geometric) ergodicity is of critical interest. Trust in
the ability of MCMCs to sample from modernday highdimensional posteriors,
however, has been limited by a widespread perception that these chains
typically experience serious convergence problems. In this paper, we first
demonstrate that contemporary methods for obtaining convergence rates have
serious limitations when the dimension grows. We then propose a framework for
rigorously establishing the convergence behavior of commonly used
highdimensional MCMCs. In particular, we demonstrate theoretically the precise
nature and severity of the convergence problems of popular MCMCs when
implemented in high dimensions, including phase transitions in the convergence
rates in various $n$ and $p$ regimes, and a universality result across an
entire spectrum of models. We also show that convergence problems effectively
eliminate the apparent safeguard of geometric ergodicity. We then demonstrate
theoretical principles by which MCMCs can be constructed and analyzed to yield
bounded geometric convergence rates even as the dimension $p$ grows without
bound. Additionally, we propose a diagnostic tool for establishing convergence.

In this paper we develop a rigorous foundation for the study of integration
and measures on the space $\mathscr{G}(V)$ of all graphs defined on a countable
labelled vertex set $V$. We first study several interrelated $\sigma$algebras
and a large family of probability measures on graph space. We then focus on a
"dyadic" Hamming distance function $\left\ \cdot \right\_{\psi,2}$, which was
very useful in the study of differentiation on $\mathscr{G}(V)$. The function
$\left\ \cdot \right\_{\psi,2}$ is shown to be a Haar measurepreserving
bijection from the subset of infinite graphs to the circle (with the
Haar/Lebesgue measure), thereby naturally identifying the two spaces. As a
consequence, we establish a "change of variables" formula that enables the
transfer of the RiemannLebesgue theory on $\mathbb{R}$ to graph space
$\mathscr{G}(V)$. This also complements previous work in which a theory of
NewtonLeibnitz differentiation was transferred from the real line to
$\mathscr{G}(V)$ for countable $V$. Finally, we identify the Pontryagin dual of
$\mathscr{G}(V)$, and characterize the positive definite functions on
$\mathscr{G}(V)$.

Understanding centennial scale climate variability requires data sets that
are accurate, long, continuous and of broad spatial coverage. Since
instrumental measurements are generally only available after 1850, temperature
fields must be reconstructed using paleoclimate archives, known as proxies.
Various climate field reconstructions (CFR) methods have been proposed to
relate past temperature to such proxy networks. In this work, we propose a new
CFR method, called GraphEM, based on Gaussian Markov random fields embedded
within an EM algorithm. Gaussian Markov random fields provide a natural and
flexible framework for modeling highdimensional spatial fields. At the same
time, they provide the parameter reduction necessary for obtaining precise and
wellconditioned estimates of the covariance structure, even in the
samplestarved setting common in paleoclimate applications. In this paper, we
propose and compare the performance of different methods to estimate the
graphical structure of climate fields, and demonstrate how the GraphEM
algorithm can be used to reconstruct past climate variations. The performance
of GraphEM is compared to the widely used CFR method RegEM with regularization
via truncated total least squares, using synthetic data. Our results show that
GraphEM can yield significant improvements, with uniform gains over space, and
far better risk properties. We demonstrate that the spatial structure of
temperature fields can be well estimated by graphs where each neighbor is only
connected to a few geographically close neighbors, and that the increase in
performance is directly related to recovering the underlying sparsity in the
covariance of the spatial field. Our work demonstrates how significant
improvements can be made in climate reconstruction methods by better modeling
the covariance structure of the climate field.

When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and ecoinformatics
the dataset is often variablerich but samplestarved: a regime where the
number $n$ of acquired samples (statistical replicates) is far fewer than the
number $p$ of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size $n$ is fixed, and the dimension $p$ grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exascale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks.

Bayesian inference for graphical models has received much attention in the
literature in recent years. It is well known that when the graph G is
decomposable, Bayesian inference is significantly more tractable than in the
general nondecomposable setting. Penalized likelihood inference on the other
hand has made tremendous gains in the past few years in terms of scalability
and tractability. Bayesian inference, however, has not had the same level of
success, though a scalable Bayesian approach has its respective strengths,
especially in terms of quantifying uncertainty. To address this gap, we propose
a scalable and flexible novel Bayesian approach for estimation and model
selection in Gaussian undirected graphical models. We first develop a class of
generalized GWishart distributions with multiple shape parameters for an
arbitrary underlying graph. This class contains the GWishart distribution as a
special case. We then introduce the class of Generalized Bartlett (GB) graphs,
and derive an efficient Gibbs sampling algorithm to obtain posterior draws from
generalized GWishart distributions corresponding to a GB graph. The class of
Generalized Bartlett graphs contains the class of decomposable graphs as a
special case, but is substantially larger than the class of decomposable
graphs. We proceed to derive theoretical properties of the proposed Gibbs
sampler. We then demonstrate that the proposed Gibbs sampler is scalable to
significantly higher dimensional problems as compared to using an acceptreject
or a MetropolisHasting algorithm. Finally, we show the efficacy of the
proposed approach on simulated and real data.

In this paper, we consider Gaussian models Markov with respect to an
arbitrary DAG. We first construct a family of conjugate priors for the Cholesky
parametrization of the covariance matrix of such models. This family has as
many shape parameters as the DAG has vertices, and naturally extends the work
of Geiger and Heckerman [8]. From these distributions, we derive prior
distributions for the covariance and precision parameters of the Gaussian DAG
Markov models. Our works thus extends the work of Dawid and Lauritzen [5] and
Letac and Massam [16] for Gaussian models Markov with respect to a decomposable
graph to arbitrary DAGs. For this reason, we call our distributions DAGWishart
distributions. An advantage of these distributions is that they possess strong
hyper Markov properties and thus allow for explicit estimation of the
covariance and precision parameters, regardless of the dimension of the
problem. They also allow us to develop methodology for model selection and
covariance estimation in the space of DAGMarkov models. We demonstrate via
several numerical examples that the proposed method scales well to
highdimensions.

Recently, the theory of dense graph limits has received attention from
multiple disciplines including graph theory, computer science, statistical
physics, probability, statistics, and group theory. In this paper we initiate
the study of the general structure of differentiable graphon parameters $F$. We
derive consistency conditions among the higher G\^ateaux derivatives of $F$
when restricted to the subspace of edge weighted graphs $\mathcal{W}_{\bf p}$.
Surprisingly, these constraints are rigid enough to imply that the multilinear
functionals $\Lambda: \mathcal{W}_{\bf p}^n \to \mathbb{R}$ satisfying the
constraints are determined by a finite set of constants indexed by isomorphism
classes of multigraphs with $n$ edges and no isolated vertices. Using this
structure theory, we explain the central role that homomorphism densities play
in the analysis of graphons, by way of a new combinatorial interpretation of
their derivatives. In particular, homomorphism densities serve as the monomials
in a polynomial algebra that can be used to approximate differential graphon
parameters as Taylor polynomials. These ideas are summarized by our main
theorem, which asserts that homomorphism densities $t(H,)$ where $H$ has at
most $N$ edges form a basis for the space of smooth graphon parameters whose
$(N+1)$st derivatives vanish. As a consequence of this theory, we also extend
and derive new proofs of linear independence of multigraph homomorphism
densities, and characterize homomorphism densities. In addition, we develop a
theory of series expansions, including Taylor's theorem for graph parameters
and a uniqueness principle for series. We use this theory to analyze questions
raised by Lov\'asz, including studying infinite quantum algebras and the
connection between right and lefthomomorphism densities.

Representing the conditional independences present in a multivariate random
vector via graphs has found widespread use in applications, and such
representations are popularly known as graphical models or Markov random
fields. These models have many useful properties, but their fundamental
attractive feature is their ability to reflect conditional independences
between blocks of variables through graph separation, a consequence of the
equivalence of the pairwise, local and global Markov properties demonstrated by
Pearl and Paz (1985). Modern day applications often necessitate working with
either an infinite collection of variables (such as in a spatialtemporal
field) or approximating a large highdimensional finite stochastic system with
an infinitedimensional system. However, it is unclear whether the conditional
independences present in an infinitedimensional random vector or stochastic
process can still be represented by separation criteria in an infinite graph.
In light of the advantages of using graphs as tools to represent stochastic
relationships, we undertake in this paper a general study of infinite graphical
models. First, we demonstrate that naive extensions of the assumptions required
for the finite case results do not yield equivalence of the Markov properties
in the infinitedimensional setting, thus calling for a more indepth analysis.
To this end, we proceed to derive general conditions which do allow
representing the conditional independence in an infinitedimensional random
system by means of graphs, and our results render the result of Pearl and Paz
as a special case of a more general phenomenon. We conclude by demonstrating
the applicability of our theory through concrete examples of
infinitedimensional graphical models.

We consider the general problem of minimizing an objective function which is
the sum of a convex function (not strictly convex) and absolute values of a
subset of variables (or equivalently the l1norm of the variables). This
problem appears exten sively in modern statistical applications associated
with highdimensional data or "big data", and corresponds to optimizing
l1regularized likelihoods in the context of model selection. In such
applications, cyclic coordinatewise minimization (CCM), where the objective
function is sequentially minimized with respect to each individual coordi
nate, is often employed as it offers a computationally cheap and effective
optimization method. Consequently, it is crucial to obtain theoretical
guarantees of convergence for the sequence of iterates produced by the cyclic
coordinatewise minimization in this setting. Moreover, as the objective
corresponds to at l1regularized likelihoods of many variables, it is important
to obtain convergence of the iterates themselves, and not just the function
values. Previous results in the literature only establish either, (i) that
every limit point of the sequence of iterates is a stationary point of the
objective function, or (ii) establish convergence under special assumptions, or
(iii) establish con vergence for a different minimization approach (which uses
quadratic approximation based gradient descent followed by an inexact line
search), (iv) establish convergence of only the function values of the sequence
of iterates produced by random coordinatewise minimization (a variant of CCM).
In this paper, a rigorous general proof of convergence for the cyclic
coordinatewise minimization algorithm is provided. We demonstrate the
usefulness of our general results in contemporary applications.