While it's always possible to compute a variational approximation to a
posterior distribution, it can be difficult to discover problems with this
approximation". We propose two diagnostic algorithms to alleviate this problem.
The Pareto-smoothed importance sampling (PSIS) diagnostic gives a goodness of
fit measurement for joint distributions, while simultaneously improving the
error in the estimate. The variational simulation-based calibration (VSBC)
assesses the average performance of point estimates.
The widely recommended procedure of Bayesian model averaging is flawed in the
M-open setting in which the true data-generating process is not one of the
candidate models being fit. We take the idea of stacking from the point
estimation literature and generalize to the combination of predictive
distributions, extending the utility function to any proper scoring rule, using
Pareto smoothed importance sampling to efficiently compute the required
leave-one-out posterior distributions and regularization to get more stability.
We compare stacking of predictive distributions to several alternatives:
stacking of means, Bayesian model averaging (BMA), pseudo-BMA using AIC-type
weighting, and a variant of pseudo-BMA that is stabilized using the Bayesian
bootstrap. Based on simulations and real-data applications, we recommend
stacking of predictive distributions, with BB-pseudo-BMA as an approximate
alternative when computation cost is an issue.
Importance weighting is a general way to adjust Monte Carlo integration to
account for draws from the wrong distribution, but the resulting estimate can
be noisy when the importance ratios have a heavy right tail. This routinely
occurs when there are aspects of the target distribution that are not well
captured by the approximating distribution, in which case more stable estimates
can be obtained by modifying extreme importance ratios. We present a new method
for stabilizing importance weights using a generalized Pareto distribution fit
to the upper tail of the distribution of the simulated importance ratios. The
method, which empirically performs better than existing methods for stabilizing
importance sampling estimates, includes stabilized effective sample size
estimates, Monte Carlo error estimates and convergence diagnostics.