
A methodology is developed for data analysis based on empirically constructed
geodesic metric spaces. For a probability distribution, the length along a path
between two points can be defined as the amount of probability mass accumulated
along the path. The geodesic, then, is the shortest such path and defines a
geodesic metric. Such metrics are transformed in a number of ways to produce
parametrised families of geodesic metric spaces, empirical versions of which
allow computation of intrinsic means and associated measures of dispersion.
These reveal properties of the data, based on geometry, such as those that are
difficult to see from the raw Euclidean distances. Examples of application
include clustering and classification. For certain parameter ranges, the spaces
become CAT(0) spaces and the intrinsic means are unique. In one case, a minimal
spanning tree of a graph based on the data becomes CAT(0). In another, a
socalled "metric cone" construction allows extension to CAT($k$) spaces. It is
shown how to empirically tune the parameters of the metrics, making it possible
to apply them to a number of real cases.

We propose an optimal experimental design for a curvilinear regression model
that minimizes the bandwidth of simultaneous confidence bands. Simultaneous
confidence bands for curvilinear regression are constructed by evaluating the
volume of a tube about a curve that is defined as a trajectory of a regression
basis vector (Naiman, 1986). The proposed criterion is constructed based on the
volume of a tube, and the corresponding optimal design that minimizes the
volume of tube is referred to as the tubevolume optimal (TVoptimal) design.
For Fourier and weighted polynomial regressions, the problem is formalized as
one of minimization over the cone of Hankel positive definite matrices, and the
criterion to minimize is expressed as an elliptic integral. We show that the
M\"obius group keeps our problem invariant, and hence, minimization can be
conducted over crosssections of orbits. We demonstrate that for the weighted
polynomial regression and the Fourier regression with three bases, the
tubevolume optimal design forms an orbit of the M\"obius group containing
Doptimal designs as representative elements.

Data can be collected in scientific studies via a controlled experiment or
passive observation. Big data is often collected in a passive way, e.g. from
social media. Understanding the difference between active and passive
observation is critical to the analysis. For example in studies of causation
great efforts are made to guard against hidden confounders or feedback which
can destroy the identification of causation by corrupting or omitting
counterfactuals (controls). Various solutions of these problems are discussed,
including randomization.

We apply the methods of algebraic reliability to the study of percolation on
trees. To a complete $k$ary tree $T_{k,n}$ of depth $n$ we assign a monomial
ideal $I_{k,n}$ on $\sum_{i=1}^n k^i$ variables and $k^n$ minimal monomial
generators. We give explicit recursive formulae for the Betti numbers of
$I_{k,n}$ and their Hilbert series, which allow us to study explicitly
percolation on $T_{k,n}$. We study bounds on this percolation and study its
asymptotical behavior with the mentioned commutative algebra techniques.

Confidence nets, that is, collections of confidence intervals that fill out
the parameter space and whose exact parameter coverage can be computed, are
familiar in nonparametric statistics. Here, the distributional assumptions are
based on invariance under the action of a finite reflection group. Exact
confidence nets are exhibited for a single parameter, based on the root system
of the group. The main result is a formula for the generating function of the
coverage interval probabilities. The proof makes use of the theory of
"buildings" and the Chevalley factorization theorem for the length distribution
on Cayley graphs of finite reflection groups.

The present paper studies multiple failure and signature analysis of coherent
systems using the theory of monomial ideals. While system reliability has been
studied using Hilbert series of monomial ideals, this is not enough to
understand in a deeper sense the ideal structure features that reflect the
behavior of the system under multiple simultaneous failures and signature.
Therefore, we introduce the lcmfiltration of a monomial ideal, and we study
the Hilbert series and resolutions of the corresponding ideals. Given a
monomial ideal, we explicitly compute the resolutions for all ideals in the
associated lcmfiltration, and we apply this to study coherent systems. Some
computational results are shown in examples to demonstrate the usefulness of
this approach and the computational issues that arise. We also study the
failure distribution from a statistical point of view by means of the algebraic
tools described.

A strong link between information geometry and algebraic statistics is made
by investigating statistical manifolds which are algebraic varieties. In
particular it it shown how first and second order efficient estimators can be
constructed, such as bias corrected Maximum Likelihood and more general
estimators, and for which the estimating equations are purely algebraic. In
addition it is shown how Gr\"obner basis technology, which is at the heart of
algebraic statistics, can be used to reduce the degrees of the terms in the
estimating equations. This points the way to the feasible use, to find the
estimators, of special methods for solving polynomial equations, such as
homotopy continuation methods. Simple examples are given showing both equations
and computations. *** The proof of Theorem 2 was corrected by the latest
version. Some minor errors were also corrected.

The extension of majorization (also called the rearrangement ordering), to
more general groups than the symmetric (permutation) group, is referred to as
$G$majorization. There are strong results in the case that $G$ is a reflection
group and this paper builds on this theory in the direction of subgroups,
normal subgroups, quotient groups and extensions. The implications for
fundamental cones and orderpreserving functions are studied. The main example
considered is the hyperoctahedral group, which, acting on a vector in $\mathbb
R^n$, permutes and changes the signs of components.

The algebraic method provides useful techniques to identify models in designs
and to understand aliasing of polynomial models. The present note surveys the
topic of Gr\"obner bases in experimental design and then describes the notion
of confounding and the algebraic fan of a design. The ideas are illustrated
with a variety of design examples ranging from Latin squares to screening
designs.

There is a duality theory connecting certain stochastic orderings between
cumulative distribution functions F_1,F_2 and stochastic orderings between
their inverses F_1^(1),F_2^(1). This underlies some theories of utility in
the case of the cdf and deprivation indices in the case of the inverse. Under
certain conditions there is an equivalence between the two theories. An example
is the equivalence between second order stochastic dominance and the Lorenz
ordering. This duality is generalised to include the case where there is
"distortion" of the cdf of the form v(F) and also of the inverse. A
comprehensive duality theorem is presented in a form which includes the
distortions and links the duality to the parallel theories of risk and
deprivation indices. It is shown that some wellknown examples are special
cases of the results, including some from the Yaari social welfare theory and
the theory of majorization.

For a joint probability density function f(x) of a random vector X the mixed
partial derivatives of log f(x) can be interpreted as limiting cumulants in an
infinitesimally small open neighborhood around x. Moreover, setting them to
zero everywhere gives independence and conditional independence conditions. The
latter conditions can be mapped, using an algebraic differential duality, into
monomial ideal conditions. This provides an isomorphism between hierarchical
models and monomial ideals. It is thus shown that certain monomial ideals are
associated with particular classes of hierarchical models.

In areas such as kernel smoothing and nonparametric regression there is
emphasis on smooth interpolation and smooth statistical models. Splines are
known to have optimal smoothness properties in one and higher dimensions. It is
shown, with special attention to polynomial models, that smooth interpolators
can be constructed by first extending the monomial basis and then minimising a
measure of smoothness with respect to the free parameters in the extended
basis. Algebraic methods are a help in choosing the extended basis which can
also be found as a saturated basis for an extended experimental design with
dummy design points. One can get arbitrarily close to optimal smoothing for any
dimension and over any region, giving a simple alternative models of spline
type. The relationship to splines is shown in one and two dimensions. A case
study is given which includes benchmarking against kriging methods.

The asymptotic behaviour of a family of gradient algorithms (including the
methods of steepest descent and minimum residues) for the optimisation of
bounded quadratic operators in R^d and Hilbert spaces is analyzed. The results
obtained generalize those of Akaike (1959) in several directions. First, all
algorithms in the family are shown to have the same asymptotic behaviour
(convergence to a twopoint attractor), which implies in particular that they
have similar asymptotic convergence rates. Second, the analysis also covers the
Hilbert space case. A detailed analysis of the stability property of the
attractor is provided.

A certain type of integer grid, called here an echelon grid, is an object
found both in coherent systems whose components have a finite or countable
number of levels and in algebraic geometry. If \alpha=(\alpha_1,...,\alpha_d)
is an integer vector representing the state of a system, then the corresponding
algebraic object is a monomial x_1^{\alpha_1}... x_d^{\alpha_d} in the
indeterminates x_1,..., x_d. The idea is to relate a coherent system to
monomial ideals, so that the socalled Scarf complex of the monomial ideal
yields an inclusionexclusion identity for the probability of failure, which
uses many fewer terms than the classical identity. Moreover in the ``general
position'' case we obtain via the Scarf complex the tube bounds given by Naiman
and Wynn [J. Inequal. Pure Appl. Math. (2001) 2 116].
Examples are given for the binary case but the full utility is for general
multistate coherent systems and a comprehensive example is given.