• Bayesian Sparse Linear Regression with Unknown Symmetric Error(1608.02143)

March 22, 2019 math.ST, stat.TH
We study full Bayesian procedures for sparse linear regression when errors have a symmetric but otherwise unknown distribution. The unknown error distribution is endowed with a symmetrized Dirichlet process mixture of Gaussians. For the prior on regression coefficients, a mixture of point masses at zero and continuous distributions is considered. We study behavior of the posterior with diverging number of predictors. Conditions are provided for consistency in the mean Hellinger distance. The compatibility and restricted eigenvalue conditions yield the minimax convergence rate of the regression coefficients in $\ell_1$- and $\ell_2$-norms, respectively. The convergence rate is adaptive to both the unknown sparsity level and the unknown symmetric error density under compatibility conditions. In addition, strong model selection consistency and a semi-parametric Bernstein-von Mises theorem are proven under slightly stronger conditions.
• Averages of Unlabeled Networks: Geometric Characterization and Asymptotic Behavior(1709.02793)

Feb. 7, 2019 math.DG, math.ST, stat.TH
It is becoming increasingly common to see large collections of network data objects -- that is, data sets in which a network is viewed as a fundamental unit of observation. As a result, there is a pressing need to develop network-based analogues of even many of the most basic tools already standard for scalar and vector data. In this paper, our focus is on averages of unlabeled, undirected networks with edge weights. Specifically, we (i) characterize a certain notion of the space of all such networks, (ii) describe key topological and geometric properties of this space relevant to doing probability and statistics thereupon, and (iii) use these properties to establish the asymptotic behavior of a generalized notion of an empirical mean under sampling from a distribution supported on this space. Our results rely on a combination of tools from geometry, probability theory, and statistical shape analysis. In particular, the lack of vertex labeling necessitates working with a quotient space modding out permutations of labels. This results in a nontrivial geometry for the space of unlabeled networks, which in turn is found to have important implications on the types of probabilistic and statistical results that may be obtained and the techniques needed to obtain them.
• Bayesian Test and Selection for Bandwidth of High-dimensional Banded Precision Matrices(1804.08650)

April 23, 2018 math.ST, stat.TH, stat.ME
Assuming a banded structure is one of the common practice in the estimation of high-dimensional precision matrix. In this case, estimating the bandwidth of the precision matrix is a crucial initial step for subsequent analysis. Although there exist some consistent frequentist tests for the bandwidth parameter, bandwidth selection consistency for precision matrices has not been established in a Bayesian framework. In this paper, we propose a prior distribution tailored to the bandwidth estimation of high-dimensional precision matrices. The banded structure is imposed via the Cholesky factor from the modified Cholesky decomposition. We establish the strong model selection consistency for the bandwidth as well as the consistency of the Bayes factor. The convergence rates for Bayes factors under both the null and alternative hypotheses are derived which yield similar order of rates. As a by-product, we also proposed an estimation procedure for the Cholesky factors yielding an almost optimal order of convergence rates. Two-sample bandwidth test is also considered, and it turns out that our method is able to consistently detect the equality of bandwidths between two precision matrices. The simulation study confirms that our method in general outperforms or is comparable to the existing frequentist and Bayesian methods.
• On Posterior Consistency of Tail Index for Bayesian Kernel Mixture Models(1511.02775)

April 18, 2018 math.ST, stat.TH
Asymptotic theory of tail index estimation has been studied extensively in the frequentist literature on extreme values, but rarely in the Bayesian context. We investigate whether popular Bayesian kernel mixture models are able to support heavy tailed distributions and consistently estimate the tail index. We show that posterior inconsistency in tail index is surprisingly common for both parametric and nonparametric mixture models. We then present a set of sufficient conditions under which posterior consistency in tail index can be achieved, and verify these conditions for Pareto mixture models under general mixing priors.
• Robust and Parallel Bayesian Model Selection(1610.06194)

March 22, 2018 stat.ML
Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large data sets that cannot be stored or processed on one machine. Another challenge one may encounter is the presence of outliers and contaminations that damage the inference quality. The parallel "divide and conquer" model selection strategy divides the observations of the full data set into roughly equal subsets and perform inference and model selection independently on each subset. After local subset inference, this method aggregates the posterior model probabilities or other model/variable selection criteria to obtain a final model by using the notion of geometric median. This approach leads to improved concentration in finding the "correct" model and model parameters and also is provably robust to outliers and data contamination.
• Differential Geometry for Model Independent Analysis of Images and Other Non-Euclidean Data: Recent Developments(1801.00898)

Jan. 3, 2018 math.ST, stat.TH
This article provides an exposition of recent methodologies for nonparametric analysis of digital observations on images and other non-Euclidean objects. Fr\'echet means of distributions on metric spaces, such as manifolds and stratified spaces, have played an important role in this endeavor. Apart from theoretical issues of uniqueness of the Fr\'echet minimizer and the asymptotic distribution of the sample Fr\'echet mean under uniqueness, applications to image analysis are highlighted. In addition, nonparametric Bayes theory is brought to bear on the problems of density estimation and classification on manifolds.
• Intrinsic Gaussian processes on complex constrained domains(1801.01061)

Jan. 3, 2018 cs.LG, stat.ML
We propose a class of intrinsic Gaussian processes (in-GPs) for interpolation, regression and classification on manifolds with a primary focus on complex constrained domains or irregular shaped spaces arising as subsets or submanifolds of R, R2, R3 and beyond. For example, in-GPs can accommodate spatial domains arising as complex subsets of Euclidean space. in-GPs respect the potentially complex boundary or interior conditions as well as the intrinsic geometry of the spaces. The key novelty of the proposed approach is to utilise the relationship between heat kernels and the transition density of Brownian motion on manifolds for constructing and approximating valid and computationally feasible covariance kernels. This enables in-GPs to be practically applied in great generality, while existing approaches for smoothing on constrained domains are limited to simple special cases. The broad utilities of the in-GP approach is illustrated through simulation studies and data examples.
• Extrinsic Gaussian processes for regression and classification on manifolds(1706.08757)

June 27, 2017 stat.ME
Gaussian processes (GPs) are very widely used for modeling of unknown functions or surfaces in applications ranging from regression to classification to spatial processes. Although there is an increasingly vast literature on applications, methods, theory and algorithms related to GPs, the overwhelming majority of this literature focuses on the case in which the input domain corresponds to a Euclidean space. However, particularly in recent years with the increasing collection of complex data, it is commonly the case that the input domain does not have such a simple form. For example, it is common for the inputs to be restricted to a non-Euclidean manifold, a case which forms the motivation for this article. In particular, we propose a general extrinsic framework for GP modeling on manifolds, which relies on embedding of the manifold into a Euclidean space and then constructing extrinsic kernels for GPs on their images. These extrinsic Gaussian processes (eGPs) are used as prior distributions for unknown functions in Bayesian inferences. Our approach is simple and general, and we show that the eGPs inherit fine theoretical properties from GP models in Euclidean spaces. We consider applications of our models to regression and classification problems with predictors lying in a large class of manifolds, including spheres, planar shape spaces, a space of positive definite matrices, and Grassmannians. Our models can be readily used by practitioners in biological sciences for various regression and classification problems, such as disease diagnosis or detection. Our work is also likely to have impact in spatial statistics when spatial locations are on the sphere or other geometric spaces.
• Scale and curvature effects in principal geodesic analysis(1610.01537)

Oct. 6, 2016 stat.OT
There is growing interest in using the close connection between differential geometry and statistics to model smooth manifold-valued data. In particular, much work has been done recently to generalize principal component analysis (PCA), the method of dimension reduction in linear spaces, to Riemannian manifolds. One such generalization is known as principal geodesic analysis (PGA). This paper, in a novel fashion, obtains Taylor expansions in scaling parameters introduced in the domain of objective functions in PGA. It is shown this technique not only leads to better closed-form approximations of PGA but also reveals the effects that scale, curvature and the distribution of data have on solutions to PGA and on their differences to first-order tangent space approximations. This approach should be able to be applied not only to PGA but also to other generalizations of PCA and more generally to other intrinsic statistics on Riemannian manifolds.
• On estimating a mixture on graphons(1606.02401)

June 28, 2016 stat.ME, stat.ML
Community detection, which focuses on clustering nodes or detecting communities in (mostly) a single network, is a problem of considerable practical interest and has received a great deal of attention in the research community. While being able to cluster within a network is important, there are emerging needs to be able to cluster multiple networks. This is largely motivated by the routine collection of network data that are generated from potentially different populations, such as brain networks of subjects from different disease groups, genders, or biological networks generated under different experimental conditions, etc. We propose a simple and general framework for clustering multiple networks based on a mixture model on graphons. Our clustering method employs graphon estimation as a first step and performs spectral clustering on the matrix of distances between estimated graphons. This is illustrated through both simulated and real data sets, and theoretical justification of the algorithm is given in terms of consistency.
• Robust and Scalable Bayes via a Median of Subset Posterior Measures(1403.2660)

June 2, 2016 cs.DC, math.ST, stat.TH, cs.LG
We propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods. Our technique is based on splitting the data into non-overlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then combining the resulting measures. The main novelty of our approach is the proposed aggregation step, which is based on the evaluation of a median in the space of probability measures equipped with a suitable collection of distances that can be quickly and efficiently evaluated in practice. We present both theoretical and numerical evidence illustrating the improvements achieved by our method.
• Omnibus CLTs for Fr\'echet means and nonparametric inference on non-Euclidean spaces(1306.5806)

March 28, 2016 math.ST, stat.TH
Two central limit theorems for sample Fr\'echet means are derived, both significant for nonparametric inference on non-Euclidean spaces. The first one, Theorem 2.2, encompasses and improves upon most earlier CLTs on Fr\'echet means and broadens the scope of the methodology beyond manifolds to diverse new non-Euclidean data including those on certain stratified spaces which are important in the study of phylogenetic trees. It does not require that the underlying distribution $Q$ have a density, and applies to both intrinsic and extrinsic analysis. The second theorem, Theorem 3.3, focuses on intrinsic means on Riemannian manifolds of dimensions $d>2$ and breaks new ground by providing a broad CLT without any of the earlier restrictive support assumptions. It makes the statistically reasonable assumption of a somewhat smooth density of $Q$. The excluded case of dimension $d=2$ proves to be an enigma, although the first theorem does provide a CLT in this case as well under a support restriction. Theorem 3.3 immediately applies to spheres $S^d$, $d>2$, which are also of considerable importance in applications to axial spaces and to landmarks based image analysis, as these spaces are quotients of spheres under a Lie group $\mathcal G$ of isometries of $S^d$.
• Learning Subspaces of Different Dimension(1404.6841)

Sept. 23, 2015 math.ST, stat.TH, stat.ME
We introduce a Bayesian model for inferring mixtures of subspaces of different dimensions. The key challenge in such a mixture model is specification of prior distributions over subspaces of different dimensions. We address this challenge by embedding subspaces or Grassmann manifolds into a sphere of relatively low dimension and specifying priors on the sphere. We provide an efficient sampling algorithm for the posterior distribution of the model parameters. We illustrate that a simple extension of our mixture of subspaces model can be applied to topic modeling. We also prove posterior consistency for the mixture of subspaces model. The utility of our approach is demonstrated with applications to real and simulated data.
• Extrinsic local regression on manifold-valued data(1508.02201)

Aug. 10, 2015 math.ST, stat.TH
We propose an extrinsic regression framework for modeling data with manifold valued responses and Euclidean predictors. Regression with manifold responses has wide applications in shape analysis, neuroscience, medical imaging and many other areas. Our approach embeds the manifold where the responses lie onto a higher dimensional Euclidean space, obtains a local regression estimate in that space, and then projects this estimate back onto the image of the manifold. Outside the regression setting both intrinsic and extrinsic approaches have been proposed for modeling i.i.d manifold-valued data. However, to our knowledge our work is the first to take an extrinsic approach to the regression problem. The proposed extrinsic regression framework is general, computationally efficient and theoretically appealing. Asymptotic distributions and convergence rates of the extrinsic regression estimates are derived and a large class of examples are considered indicating the wide applicability of our approach.
• Data augmentation for models based on rejection sampling(1406.6652)

Aug. 3, 2015 stat.CO
We present a data augmentation scheme to perform Markov chain Monte Carlo inference for models where data generation involves a rejection sampling algorithm. Our idea, which seems to be missing in the literature, is a simple scheme to instantiate the rejected proposals preceding each data point. The resulting joint probability over observed and rejected variables can be much simpler than the marginal distribution over the observed variables, which often involves intractable integrals. We consider three problems, the first being the modeling of flow-cytometry measurements subject to truncation. The second is a Bayesian analysis of the matrix Langevin distribution on the Stiefel manifold, and the third, Bayesian inference for a nonparametric Gaussian process density model. The latter two are instances of problems where Markov chain Monte Carlo inference is doubly-intractable. Our experiments demonstrate superior performance over state-of-the-art sampling algorithms for such problems.
• Bayesian nonparametric inference on the Stiefel manifold(1311.0907)

July 3, 2014 stat.CO
The Stiefel manifold $V_{p,d}$ is the space of all $d \times p$ orthonormal matrices, with the $d-1$ hypersphere and the space of all orthogonal matrices constituting special cases. In modeling data lying on the Stiefel manifold, parametric distributions such as the matrix Langevin distribution are often used; however, model misspecification is a concern and it is desirable to have nonparametric alternatives. Current nonparametric methods are Fr\'echet mean based. We take a fully generative nonparametric approach, which relies on mixing parametric kernels such as the matrix Langevin. The proposed kernel mixtures can approximate a large class of distributions on the Stiefel manifold, and we develop theory showing posterior consistency. While there exists work developing general posterior consistency results, extending these results to this particular manifold requires substantial new theory. Posterior inference is illustrated on a real-world dataset of near-Earth objects.
• Bayesian Monotone Regression using Gaussian Process Projection(1306.4041)

June 17, 2013 stat.ME
Shape constrained regression analysis has applications in dose-response modeling, environmental risk assessment, disease screening and many other areas. Incorporating the shape constraints can improve estimation efficiency and avoid implausible results. We propose two novel methods focusing on Bayesian monotone curve and surface estimation using Gaussian process projections. The first projects samples from an unconstrained prior, while the second projects samples from the Gaussian process posterior. Theory is developed on continuity of the projection, posterior consistency and rates of contraction. The second approach is shown to have an empirical Bayes justification and to lead to simple computation with good performance in finite samples. Our projection approach can be applied in other constrained function estimation problems including in multivariate settings.