
Gaussian graphical models are used for determining conditional relationships
between variables. This is accomplished by identifying offdiagonal elements in
the inversecovariance matrix that are nonzero. When the ratio of variables
(p) to observations (n) approaches one, the maximum likelihood estimator of the
covariance matrix becomes unstable and requires shrinkage estimation. Whereas
several classical (frequentist) methods have been introduced to address this
issue, Bayesian methods remain relatively uncommon in practice and
methodological literatures. Here we introduce a Bayesian method for estimating
sparse matrices, in which conditional relationships are determined with
projection predictive selection. This method uses KullbackLeibler divergence
and crossvalidation for variable selection, in addition to the horseshoe prior
for regularization. Through simulation and an applied example, we demonstrate
that the proposed method often outperforms classical methods, such as the
graphical lasso, as well as an alternative Bayesian method with respect to edge
identification and frequentist risk. Further, projection predictive selection
consistently had the lowest false positive rate, both with simulated and real
data. We end by discussing future directions and contributions to the Bayesian
literature on the topic of sparsity.

In highdimensional prediction problems, where the number of features may
greatly exceed the number of training instances, fully Bayesian approach with a
sparsifying prior is known to produce good results but is computationally
challenging. To alleviate this computational burden, we propose to use a
preprocessing step where we first apply a dimension reduction to the original
data to reduce the number of features to something that is computationally
conveniently handled by Bayesian methods. To do this, we propose a new
dimension reduction technique, called iterative supervised principal components
(ISPC), which combines variable screening and dimension reduction and can be
considered as an extension to the existing technique of supervised principal
components (SPCs). Our empirical evaluations confirm that, although not
foolproof, the proposed approach provides very good results on several
microarray benchmark datasets with very affordable computation time, and can
also be very useful for visualizing highdimensional data.

The horseshoe prior has proven to be a noteworthy alternative for sparse
Bayesian estimation, but has previously suffered from two problems. First,
there has been no systematic way of specifying a prior for the global shrinkage
hyperparameter based on the prior information about the degree of sparsity in
the parameter vector. Second, the horseshoe prior has the undesired property
that there is no possibility of specifying separately information about
sparsity and the amount of regularization for the largest coefficients, which
can be problematic with weakly identified parameters, such as the logistic
regression coefficients in the case of data separation. This paper proposes
solutions to both of these problems. We introduce a concept of effective number
of nonzero parameters, show an intuitive way of formulating the prior for the
global hyperparameter based on the sparsity assumptions, and argue that the
previous default choices are dubious based on their tendency to favor solutions
with more unshrunk parameters than we typically expect a priori. Moreover, we
introduce a generalization to the horseshoe prior, called the regularized
horseshoe, that allows us to specify a minimum level of regularization to the
largest values. We show that the new prior can be considered as the continuous
counterpart of the spikeandslab prior with a finite slab width, whereas the
original horseshoe resembles the spikeandslab with an infinitely wide slab.
Numerical experiments on synthetic and real world data illustrate the benefit
of both of these theoretical advances.

The horseshoe prior has proven to be a noteworthy alternative for sparse
Bayesian estimation, but as shown in this paper, the results can be sensitive
to the prior choice for the global shrinkage hyperparameter. We argue that the
previous default choices are dubious due to their tendency to favor solutions
with more unshrunk coefficients than we typically expect a priori. This can
lead to bad results if this parameter is not strongly identified by data. We
derive the relationship between the global parameter and the effective number
of nonzeros in the coefficient vector, and show an easy and intuitive way of
setting up the prior for the global parameter based on our prior beliefs about
the number of nonzero coefficients in the model. The results on real world data
show that one can benefit greatly  in terms of improved parameter estimates,
prediction accuracy, and reduced computation time  from transforming even a
crude guess for the number of nonzero coefficients into the prior for the
global parameter using our framework.

We propose a new method for automatically detecting monotonic inputoutput
relationships from data using Gaussian Process (GP) models with virtual
derivative observations. Our results on synthetic and real datasets show that
the proposed method detects monotonic directions from input spaces with high
accuracy. We expect the method to be useful especially for improving
explainability of the models and improving the accuracy of regression and
classification tasks, especially near the edges of the data or when
extrapolating.

We propose a new method for simplification of Gaussian process (GP) models by
projecting the information contained in the full encompassing model and
selecting a reduced number of variables based on their predictive relevance.
Our results on synthetic and real world datasets show that the proposed method
improves the assessment of variable relevance compared to the automatic
relevance determination (ARD) via the lengthscale parameters. We expect the
method to be useful for improving explainability of the models, reducing the
future measurement costs and reducing the computation time for making new
predictions.

The goal of this paper is to compare several widely used Bayesian model
selection methods in practical model selection problems, highlight their
differences and give recommendations about the preferred approaches. We focus
on the variable subset selection for regression and classification and perform
several numerical experiments using both simulated and real world data. The
results show that the optimization of a utility estimate such as the
crossvalidation (CV) score is liable to finding overfitted models due to
relatively high variance in the utility estimates when the data is scarce. This
can also lead to substantial selection induced bias and optimism in the
performance evaluation for the selected model. From a predictive viewpoint,
best results are obtained by accounting for model uncertainty by forming the
full encompassing model, such as the Bayesian model averaging solution over the
candidate models. If the encompassing model is too complex, it can be robustly
simplified by the projection method, in which the information of the full model
is projected onto the submodels. This approach is substantially less prone to
overfitting than selection based on CVscore. Overall, the projection method
appears to outperform also the maximum a posteriori model and the selection of
the most probable variables. The study also demonstrates that the model
selection can greatly benefit from using crossvalidation outside the searching
process both for guiding the model size selection and assessing the predictive
performance of the finally selected model.

This document is additional material to our previous study comparing several
strategies for variable subset selection. Our recommended approach was to fit
the full model with all the candidate variables and best possible prior
information, and perform the variable selection using the projection predictive
framework. Here we give an example of performing such an analysis, using Stan
for fitting the model, and R for the variable selection.