• Empirical geodesic graphs and CAT(k) metrics for data analysis(1401.3020)

March 14, 2019 math.ST, stat.TH
A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. For a probability distribution, the length along a path between two points can be defined as the amount of probability mass accumulated along the path. The geodesic, then, is the shortest such path and defines a geodesic metric. Such metrics are transformed in a number of ways to produce parametrised families of geodesic metric spaces, empirical versions of which allow computation of intrinsic means and associated measures of dispersion. These reveal properties of the data, based on geometry, such as those that are difficult to see from the raw Euclidean distances. Examples of application include clustering and classification. For certain parameter ranges, the spaces become CAT(0) spaces and the intrinsic means are unique. In one case, a minimal spanning tree of a graph based on the data becomes CAT(0). In another, a so-called "metric cone" construction allows extension to CAT($k$) spaces. It is shown how to empirically tune the parameters of the metrics, making it possible to apply them to a number of real cases.
• Optimal experimental design that minimizes the width of simultaneous confidence bands(1704.03995)

March 30, 2019 math.ST, stat.TH, stat.ME
We propose an optimal experimental design for a curvilinear regression model that minimizes the band-width of simultaneous confidence bands. Simultaneous confidence bands for curvilinear regression are constructed by evaluating the volume of a tube about a curve that is defined as a trajectory of a regression basis vector (Naiman, 1986). The proposed criterion is constructed based on the volume of a tube, and the corresponding optimal design that minimizes the volume of tube is referred to as the tube-volume optimal (TV-optimal) design. For Fourier and weighted polynomial regressions, the problem is formalized as one of minimization over the cone of Hankel positive definite matrices, and the criterion to minimize is expressed as an elliptic integral. We show that the M\"obius group keeps our problem invariant, and hence, minimization can be conducted over cross-sections of orbits. We demonstrate that for the weighted polynomial regression and the Fourier regression with three bases, the tube-volume optimal design forms an orbit of the M\"obius group containing D-optimal designs as representative elements.
• Passive and Active Observation: Experimental Design Issues in Big Data(1712.06916)

Jan. 11, 2018 stat.ME
Data can be collected in scientific studies via a controlled experiment or passive observation. Big data is often collected in a passive way, e.g. from social media. Understanding the difference between active and passive observation is critical to the analysis. For example in studies of causation great efforts are made to guard against hidden confounders or feedback which can destroy the identification of causation by corrupting or omitting counterfactuals (controls). Various solutions of these problems are discussed, including randomization.
• The algebraic method in tree percolation(1510.04036)

March 31, 2016 math.CO, math.PR, math.AC, math.RA
We apply the methods of algebraic reliability to the study of percolation on trees. To a complete $k$-ary tree $T_{k,n}$ of depth $n$ we assign a monomial ideal $I_{k,n}$ on $\sum_{i=1}^n k^i$ variables and $k^n$ minimal monomial generators. We give explicit recursive formulae for the Betti numbers of $I_{k,n}$ and their Hilbert series, which allow us to study explicitly percolation on $T_{k,n}$. We study bounds on this percolation and study its asymptotical behavior with the mentioned commutative algebra techniques.
• "Building" exact confidence nets(1407.8375)

March 9, 2016 math.ST, stat.TH, math.GR
Confidence nets, that is, collections of confidence intervals that fill out the parameter space and whose exact parameter coverage can be computed, are familiar in nonparametric statistics. Here, the distributional assumptions are based on invariance under the action of a finite reflection group. Exact confidence nets are exhibited for a single parameter, based on the root system of the group. The main result is a formula for the generating function of the coverage interval probabilities. The proof makes use of the theory of "buildings" and the Chevalley factorization theorem for the length distribution on Cayley graphs of finite reflection groups.
• Types of signature analysis in reliability based on Hilbert series(1510.04427)

Oct. 15, 2015 math.PR, math.AC
The present paper studies multiple failure and signature analysis of coherent systems using the theory of monomial ideals. While system reliability has been studied using Hilbert series of monomial ideals, this is not enough to understand in a deeper sense the ideal structure features that reflect the behavior of the system under multiple simultaneous failures and signature. Therefore, we introduce the lcm-filtration of a monomial ideal, and we study the Hilbert series and resolutions of the corresponding ideals. Given a monomial ideal, we explicitly compute the resolutions for all ideals in the associated lcm-filtration, and we apply this to study coherent systems. Some computational results are shown in examples to demonstrate the usefulness of this approach and the computational issues that arise. We also study the failure distribution from a statistical point of view by means of the algebraic tools described.
• Computational algebraic methods in efficient estimation(1310.6515)

Jan. 10, 2014 math.ST, stat.TH
A strong link between information geometry and algebraic statistics is made by investigating statistical manifolds which are algebraic varieties. In particular it it shown how first and second order efficient estimators can be constructed, such as bias corrected Maximum Likelihood and more general estimators, and for which the estimating equations are purely algebraic. In addition it is shown how Gr\"obner basis technology, which is at the heart of algebraic statistics, can be used to reduce the degrees of the terms in the estimating equations. This points the way to the feasible use, to find the estimators, of special methods for solving polynomial equations, such as homotopy continuation methods. Simple examples are given showing both equations and computations. *** The proof of Theorem 2 was corrected by the latest version. Some minor errors were also corrected.
• Subgroup Majorization(1303.2707)

Nov. 27, 2013 math.ST, stat.TH, math.GR
The extension of majorization (also called the rearrangement ordering), to more general groups than the symmetric (permutation) group, is referred to as $G$-majorization. There are strong results in the case that $G$ is a reflection group and this paper builds on this theory in the direction of subgroups, normal subgroups, quotient groups and extensions. The implications for fundamental cones and order-preserving functions are studied. The main example considered is the hyperoctahedral group, which, acting on a vector in $\mathbb R^n$, permutes and changes the signs of components.
• The algebraic method in experimental design(1207.2968)

July 12, 2012 math.AC, stat.ME
The algebraic method provides useful techniques to identify models in designs and to understand aliasing of polynomial models. The present note surveys the topic of Gr\"obner bases in experimental design and then describes the notion of confounding and the algebraic fan of a design. The ideas are illustrated with a variety of design examples ranging from Latin squares to screening designs.
• (U,V)-Ordering and a Duality Theorem for Risk Aversion and Lorenz-type Orderings(1108.1019)

Aug. 4, 2011 math.PR, math.ST, stat.TH
There is a duality theory connecting certain stochastic orderings between cumulative distribution functions F_1,F_2 and stochastic orderings between their inverses F_1^(-1),F_2^(-1). This underlies some theories of utility in the case of the cdf and deprivation indices in the case of the inverse. Under certain conditions there is an equivalence between the two theories. An example is the equivalence between second order stochastic dominance and the Lorenz ordering. This duality is generalised to include the case where there is "distortion" of the cdf of the form v(F) and also of the inverse. A comprehensive duality theorem is presented in a form which includes the distortions and links the duality to the parallel theories of risk and deprivation indices. It is shown that some well-known examples are special cases of the results, including some from the Yaari social welfare theory and the theory of majorization.
• Differential cumulants, hierachical models and monomial ideals(1102.2118)

Feb. 10, 2011 math.ST, stat.TH
For a joint probability density function f(x) of a random vector X the mixed partial derivatives of log f(x) can be interpreted as limiting cumulants in an infinitesimally small open neighborhood around x. Moreover, setting them to zero everywhere gives independence and conditional independence conditions. The latter conditions can be mapped, using an algebraic differential duality, into monomial ideal conditions. This provides an isomorphism between hierarchical models and monomial ideals. It is thus shown that certain monomial ideals are associated with particular classes of hierarchical models.
• Smooth supersaturated models(0809.4654)

Sept. 26, 2008 stat.CO
In areas such as kernel smoothing and non-parametric regression there is emphasis on smooth interpolation and smooth statistical models. Splines are known to have optimal smoothness properties in one and higher dimensions. It is shown, with special attention to polynomial models, that smooth interpolators can be constructed by first extending the monomial basis and then minimising a measure of smoothness with respect to the free parameters in the extended basis. Algebraic methods are a help in choosing the extended basis which can also be found as a saturated basis for an extended experimental design with dummy design points. One can get arbitrarily close to optimal smoothing for any dimension and over any region, giving a simple alternative models of spline type. The relationship to splines is shown in one and two dimensions. A case study is given which includes benchmarking against kriging methods.
• Asymptotic behaviour of a family of gradient algorithms in R^d and Hilbert spaces(0802.4382)

Feb. 29, 2008 math.OC
The asymptotic behaviour of a family of gradient algorithms (including the methods of steepest descent and minimum residues) for the optimisation of bounded quadratic operators in R^d and Hilbert spaces is analyzed. The results obtained generalize those of Akaike (1959) in several directions. First, all algorithms in the family are shown to have the same asymptotic behaviour (convergence to a two-point attractor), which implies in particular that they have similar asymptotic convergence rates. Second, the analysis also covers the Hilbert space case. A detailed analysis of the stability property of the attractor is provided.
• Monomial ideals and the Scarf complex for coherent systems in reliability theory(math/0406527)

June 25, 2004 math.ST, stat.TH
A certain type of integer grid, called here an echelon grid, is an object found both in coherent systems whose components have a finite or countable number of levels and in algebraic geometry. If \alpha=(\alpha_1,...,\alpha_d) is an integer vector representing the state of a system, then the corresponding algebraic object is a monomial x_1^{\alpha_1}... x_d^{\alpha_d} in the indeterminates x_1,..., x_d. The idea is to relate a coherent system to monomial ideals, so that the so-called Scarf complex of the monomial ideal yields an inclusion-exclusion identity for the probability of failure, which uses many fewer terms than the classical identity. Moreover in the general position'' case we obtain via the Scarf complex the tube bounds given by Naiman and Wynn [J. Inequal. Pure Appl. Math. (2001) 2 1-16]. Examples are given for the binary case but the full utility is for general multistate coherent systems and a comprehensive example is given.