• Gaussian graphical models are used for determining conditional relationships between variables. This is accomplished by identifying off-diagonal elements in the inverse-covariance matrix that are non-zero. When the ratio of variables (p) to observations (n) approaches one, the maximum likelihood estimator of the covariance matrix becomes unstable and requires shrinkage estimation. Whereas several classical (frequentist) methods have been introduced to address this issue, Bayesian methods remain relatively uncommon in practice and methodological literatures. Here we introduce a Bayesian method for estimating sparse matrices, in which conditional relationships are determined with projection predictive selection. This method uses Kullback-Leibler divergence and cross-validation for variable selection, in addition to the horseshoe prior for regularization. Through simulation and an applied example, we demonstrate that the proposed method often outperforms classical methods, such as the graphical lasso, as well as an alternative Bayesian method with respect to edge identification and frequentist risk. Further, projection predictive selection consistently had the lowest false positive rate, both with simulated and real data. We end by discussing future directions and contributions to the Bayesian literature on the topic of sparsity.
  • In high-dimensional prediction problems, where the number of features may greatly exceed the number of training instances, fully Bayesian approach with a sparsifying prior is known to produce good results but is computationally challenging. To alleviate this computational burden, we propose to use a preprocessing step where we first apply a dimension reduction to the original data to reduce the number of features to something that is computationally conveniently handled by Bayesian methods. To do this, we propose a new dimension reduction technique, called iterative supervised principal components (ISPC), which combines variable screening and dimension reduction and can be considered as an extension to the existing technique of supervised principal components (SPCs). Our empirical evaluations confirm that, although not foolproof, the proposed approach provides very good results on several microarray benchmark datasets with very affordable computation time, and can also be very useful for visualizing high-dimensional data.
  • The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but has previously suffered from two problems. First, there has been no systematic way of specifying a prior for the global shrinkage hyperparameter based on the prior information about the degree of sparsity in the parameter vector. Second, the horseshoe prior has the undesired property that there is no possibility of specifying separately information about sparsity and the amount of regularization for the largest coefficients, which can be problematic with weakly identified parameters, such as the logistic regression coefficients in the case of data separation. This paper proposes solutions to both of these problems. We introduce a concept of effective number of nonzero parameters, show an intuitive way of formulating the prior for the global hyperparameter based on the sparsity assumptions, and argue that the previous default choices are dubious based on their tendency to favor solutions with more unshrunk parameters than we typically expect a priori. Moreover, we introduce a generalization to the horseshoe prior, called the regularized horseshoe, that allows us to specify a minimum level of regularization to the largest values. We show that the new prior can be considered as the continuous counterpart of the spike-and-slab prior with a finite slab width, whereas the original horseshoe resembles the spike-and-slab with an infinitely wide slab. Numerical experiments on synthetic and real world data illustrate the benefit of both of these theoretical advances.
  • The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but as shown in this paper, the results can be sensitive to the prior choice for the global shrinkage hyperparameter. We argue that the previous default choices are dubious due to their tendency to favor solutions with more unshrunk coefficients than we typically expect a priori. This can lead to bad results if this parameter is not strongly identified by data. We derive the relationship between the global parameter and the effective number of nonzeros in the coefficient vector, and show an easy and intuitive way of setting up the prior for the global parameter based on our prior beliefs about the number of nonzero coefficients in the model. The results on real world data show that one can benefit greatly -- in terms of improved parameter estimates, prediction accuracy, and reduced computation time -- from transforming even a crude guess for the number of nonzero coefficients into the prior for the global parameter using our framework.
  • We propose a new method for automatically detecting monotonic input-output relationships from data using Gaussian Process (GP) models with virtual derivative observations. Our results on synthetic and real datasets show that the proposed method detects monotonic directions from input spaces with high accuracy. We expect the method to be useful especially for improving explainability of the models and improving the accuracy of regression and classification tasks, especially near the edges of the data or when extrapolating.
  • We propose a new method for simplification of Gaussian process (GP) models by projecting the information contained in the full encompassing model and selecting a reduced number of variables based on their predictive relevance. Our results on synthetic and real world datasets show that the proposed method improves the assessment of variable relevance compared to the automatic relevance determination (ARD) via the length-scale parameters. We expect the method to be useful for improving explainability of the models, reducing the future measurement costs and reducing the computation time for making new predictions.
  • The goal of this paper is to compare several widely used Bayesian model selection methods in practical model selection problems, highlight their differences and give recommendations about the preferred approaches. We focus on the variable subset selection for regression and classification and perform several numerical experiments using both simulated and real world data. The results show that the optimization of a utility estimate such as the cross-validation (CV) score is liable to finding overfitted models due to relatively high variance in the utility estimates when the data is scarce. This can also lead to substantial selection induced bias and optimism in the performance evaluation for the selected model. From a predictive viewpoint, best results are obtained by accounting for model uncertainty by forming the full encompassing model, such as the Bayesian model averaging solution over the candidate models. If the encompassing model is too complex, it can be robustly simplified by the projection method, in which the information of the full model is projected onto the submodels. This approach is substantially less prone to overfitting than selection based on CV-score. Overall, the projection method appears to outperform also the maximum a posteriori model and the selection of the most probable variables. The study also demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.
  • This document is additional material to our previous study comparing several strategies for variable subset selection. Our recommended approach was to fit the full model with all the candidate variables and best possible prior information, and perform the variable selection using the projection predictive framework. Here we give an example of performing such an analysis, using Stan for fitting the model, and R for the variable selection.