• ### Prediction risk for the horseshoe regression(1605.04796)

March 5, 2019 math.ST, stat.TH
We show that prediction performance for global-local shrinkage regression can overcome two major difficulties of global shrinkage regression: (i) the amount of relative shrinkage is monotone in the singular values of the design matrix and (ii) the shrinkage is determined by a single tuning parameter. Specifically, we show that the horseshoe regression, with heavy-tailed component-specific local shrinkage parameters, in conjunction with a global parameter providing shrinkage towards zero, alleviates both these difficulties and consequently, results in an improved risk for prediction. Numerical demonstrations of improved prediction over competing approaches in simulations and in a pharmacogenomics data set confirm our theoretical findings.
• ### Lasso Meets Horseshoe : A Survey(1706.10179)

March 3, 2019 stat.ME
The goal of this paper is to contrast and survey the major advances in two of the most commonly used high-dimensional techniques, namely, the Lasso and horseshoe regularization. Lasso is a gold standard for predictor selection while horseshoe is a state-of-the-art Bayesian estimator for sparse signals. Lasso is fast and scalable and uses convex optimization whilst the horseshoe is non-convex. Our novel perspective focuses on three aspects: (i) theoretical optimality in high dimensional inference for the Gaussian sparse model and beyond, (ii) efficiency and scalability of computation and (iii) methodological development and performance.
• ### The Graphical Horseshoe Estimator for Inverse Covariance Matrices(1707.06661)

Jan. 6, 2019 stat.ME
We develop a new estimator of the inverse covariance matrix for high-dimensional multivariate normal data using the horseshoe prior. The proposed graphical horseshoe estimator has attractive properties compared to other popular estimators, such as the graphical lasso and graphical Smoothly Clipped Absolute Deviation (SCAD). The most prominent benefit is that when the true inverse covariance matrix is sparse, the graphical horseshoe provides estimates with small information divergence from the true sampling distribution. The posterior mean under the graphical horseshoe prior can also be almost unbiased under certain conditions. In addition to these theoretical results, we also provide a full Gibbs sampler for implementing our estimator. MATLAB code is available for download from github at http://github.com/liyf1988/GHS. The graphical horseshoe estimator compares favorably to existing techniques in simulations and in a human gene network data analysis.
• ### Divide and Recombine for Large and Complex Data: Model Likelihood Functions using MCMC(1801.05007)

Jan. 15, 2018 stat.ME, stat.ML
In Divide & Recombine (D&R), big data are divided into subsets, each analytic method is applied to subsets, and the outputs are recombined. This enables deep analysis and practical computational performance. An innovate D\&R procedure is proposed to compute likelihood functions of data-model (DM) parameters for big data. The likelihood-model (LM) is a parametric probability density function of the DM parameters. The density parameters are estimated by fitting the density to MCMC draws from each subset DM likelihood function, and then the fitted densities are recombined. The procedure is illustrated using normal and skew-normal LMs for the logistic regression DM.
• ### Horseshoe Regularization for Feature Subset Selection(1702.07400)

June 22, 2017 stat.CO, stat.ML
Feature subset selection arises in many high-dimensional applications of statistics, such as compressed sensing and genomics. The $\ell_0$ penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex $\ell_\gamma$ penalty for $\gamma\in (0,1)$, which results in sparser models than the convex $\ell_1$ or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing non-convex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables efficient expectation-maximization and local linear approximation algorithms for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithms provide better statistical performance, and the computation requires a fraction of time of state-of-the-art non-convex solvers.
• ### Global-Local Mixtures(1604.07487)

Sept. 21, 2016 math.ST, stat.TH
Global-local mixtures are derived from the Cauchy-Schlomilch and Liouville integral transformation identities. We characterize well-known normal-scale mixture distributions including the Laplace or lasso, logit and quantile as well as new global-local mixtures. We also apply our methodology to convolutions that commonly arise in Bayesian inference. Finally, we conclude with a conjecture concerning bridge and uniform correlation mixtures.
• ### Default Bayesian analysis with global-local shrinkage priors(1510.03516)

May 15, 2016 stat.ME
We provide a framework for assessing the default nature of a prior distribution using the property of regular variation, which we study for global-local shrinkage priors. In particular, we demonstrate the horseshoe priors, originally designed to handle sparsity, also possess regular variation and thus are appropriate for default Bayesian analysis. To illustrate our methodology, we solve a problem of non-informative priors due to Efron (1973), who showed standard flat non-informative priors in high-dimensional normal means model can be highly informative for nonlinear parameters of interest. We consider four such problems and show global-local shrinkage priors such as the horseshoe and horseshoe+ perform as Efron (1973) requires in each case. We find the reason for this lies in the ability of the global-local shrinkage priors to separate a low-dimensional signal embedded in high-dimensional noise, even for nonlinear functions.
• ### Inferring network structure in non-normal and mixed discrete-continuous genomic data(1604.00376)

April 1, 2016 stat.ME
Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach.
• ### The Horseshoe+ Estimator of Ultra-Sparse Signals(1502.00560)

June 15, 2015 math.ST, stat.TH
We propose a new prior for ultra-sparse signal detection that we term the "horseshoe+ prior." The horseshoe+ prior is a natural extension of the horseshoe prior that has achieved success in the estimation and detection of sparse signals and has been shown to possess a number of desirable theoretical properties while enjoying computational feasibility in high dimensions. The horseshoe+ prior builds upon these advantages. Our work proves that the horseshoe+ posterior concentrates at a rate faster than that of the horseshoe in the Kullback-Leibler (K-L) sense. We also establish theoretically that the proposed estimator has lower posterior mean squared error in estimating signals compared to the horseshoe and achieves the optimal Bayes risk in testing up to a constant. For global-local scale mixture priors, we develop a new technique for analyzing the marginal sparse prior densities using the class of Meijer-G functions. In simulations, the horseshoe+ estimator demonstrates superior performance in a standard design setting against competing methods, including the horseshoe and Dirichlet-Laplace estimators. We conclude with an illustration on a prostate cancer data set and by pointing out some directions for future research.
• ### Iterated filtering(0902.0347)

Nov. 22, 2012 math.ST, stat.TH
Inference for partially observed Markov process models has been a longstanding methodological challenge with many scientific and engineering applications. Iterated filtering algorithms maximize the likelihood function for partially observed Markov process models by solving a recursive sequence of filtering problems. We present new theoretical results pertaining to the convergence of iterated filtering algorithms implemented via sequential Monte Carlo filters. This theory complements the growing body of empirical evidence that iterated filtering algorithms provide an effective inference strategy for scientific models of nonlinear dynamic systems. The first step in our theory involves studying a new recursive approach for maximizing the likelihood function of a latent variable model, when this likelihood is evaluated via importance sampling. This leads to the consideration of an iterated importance sampling algorithm which serves as a simple special case of iterated filtering, and may have applicability in its own right.
• ### Nonparametric Bayesian Approaches to Non-homogeneous Hidden Markov Models(1205.1839)

May 8, 2012 stat.ME
In this article a flexible Bayesian non-parametric model is proposed for non-homogeneous hidden Markov models. The model is developed through the amalgamation of the ideas of hidden Markov models and predictor dependent stick-breaking processes. Computation is carried out using auxiliary variable representation of the model which enable us to perform exact MCMC sampling from the posterior. Furthermore, the model is extended to the situation when the predictors can simultaneously in influence the transition dynamics of the hidden states as well as the emission distribution. Estimates of few steps ahead conditional predictive distributions of the response have been used as performance diagnostics for these models. The proposed methodology is illustrated through simulation experiments as well as analysis of a real data set concerned with the prediction of rainfall induced malaria epidemics.