• With the rapid growth of modern technology, many large-scale biomedical studies have been/are being/will be conducted to collect massive datasets with large volumes of multi-modality imaging, genetic, neurocognitive, and clinical information from increasingly large cohorts. Simultaneously extracting and integrating rich and diverse heterogeneous information in neuroimaging and/or genomics from these big datasets could transform our understanding of how genetic variants impact brain structure and function, cognitive function, and brain-related disease risk across the lifespan. Such understanding is critical for diagnosis, prevention, and treatment of numerous complex brain-related disorders (e.g., schizophrenia and Alzheimer). However, the development of analytical methods for the joint analysis of both high-dimensional imaging phenotypes and high-dimensional genetic data, called big data squared (BD$^2$), presents major computational and theoretical challenges for existing analytical methods. Besides the high-dimensional nature of BD$^2$, various neuroimaging measures often exhibit strong spatial smoothness and dependence and genetic markers may have a natural dependence structure arising from linkage disequilibrium. We review some recent developments of various statistical techniques for the joint analysis of BD$^2$, including massive univariate and voxel-wise approaches, reduced rank regression, mixture models, and group sparse multi-task regression. By doing so, we hope that this review may encourage others in the statistical community to enter into this new and exciting field of research.
  • We develop a Bayesian highest-density interval (HDI) for use in within-subject designs. This credible interval is based on a standard noninformative prior and a modified posterior distribution that conditions on both the data and point estimates of the subject-specific random effects. Conditioning on the estimated random effects removes between-subject variance and produces intervals that are the Bayesian analogue of the within-subject confidence interval proposed in Loftus and Masson (1994). We show that the latter interval can also be derived as a Bayesian within-subject HDI under a certain improper prior. We argue that the proposed new interval is superior to the original within-subject confidence interval, on the grounds of (a) it being based on a more sensible prior, (b) it having a clear and intuitively appealing interpretation, and (c) because its length is always smaller. A generalization of the new interval that can be applied to heteroscedastic data is also derived, and we show that the resulting interval is numerically equivalent to the normalization method discussed in Franz and Loftus (2012); however, our work provides a Bayesian formulation for the normalization method, and in doing so we identify the associated prior distribution.
  • Statistical modeling of fMRI data is challenging as the data are both spatially and temporally correlated. Spatially, measurements are taken at thousands of contiguous regions, called voxels, and temporally measurements are taken at hundreds of time points at each voxel. Recent advances in Bayesian hierarchical modeling have addressed the challenges of spatiotemproal structure in fMRI data with models incorporating both spatial and temporal priors for signal and noise. While there has been extensive research on modeling the fMRI signal (i.e., the covolution of the experimental design with the functional choice for the hemodynamic response function) and its spatial variability, less attention has been paid to realistic modeling of the temporal dependence that typically exists within the fMRI noise, where a low order autoregressive process is typically adopted. Furthermore, the AR order is held constant across voxels (e.g. AR(1) at each voxel). Motivated by an event-related fMRI experiment, we propose a novel hierarchical Bayesian model with automatic selection of the autoregressive orders of the noise process that vary spatially over the brain. With simulation studies we show that our model has improved accuracy and apply it to our motivating example.
  • The Log-Gaussian Cox Process is a commonly used model for the analysis of spatial point patterns. Fitting this model is difficult because of its doubly-stochastic property, i.e., it is an hierarchical combination of a Poisson process at the first level and a Gaussian Process at the second level. Different methods have been proposed to estimate such a process, including traditional likelihood-based approaches as well as Bayesian methods. We focus here on Bayesian methods and several approaches that have been considered for model fitting within this framework, including Hamiltonian Monte Carlo, the Integrated nested Laplace approximation, and Variational Bayes. We consider these approaches and make comparisons with respect to statistical and computational efficiency. These comparisons are made through several simulations studies as well as through applications examining both ecological data and neuroimaging data.
  • Motivation: Recent advances in technology for brain imaging and high-throughput genotyping have motivated studies examining the influence of genetic variation on brain structure. Wang et al. (Bioinformatics, 2012) have developed an approach for the analysis of imaging genomic studies using penalized multi-task regression with regularization based on a novel group $l_{2,1}$-norm penalty which encourages structured sparsity at both the gene level and SNP level. While incorporating a number of useful features, the proposed method only furnishes a point estimate of the regression coefficients; techniques for conducting statistical inference are not provided. A new Bayesian method is proposed here to overcome this limitation. Results: We develop a Bayesian hierarchical modeling formulation where the posterior mode corresponds to the estimator proposed by Wang et al. (Bioinformatics, 2012), and an approach that allows for full posterior inference including the construction of interval estimates for the regression parameters. We show that the proposed hierarchical model can be expressed as a three-level Gaussian scale mixture and this representation facilitates the use of a Gibbs sampling algorithm for posterior simulation. Simulation studies demonstrate that the interval estimates obtained using our approach achieve adequate coverage probabilities that outperform those obtained from the nonparametric bootstrap. Our proposed methodology is applied to the analysis of neuroimaging and genetic data collected as part of the Alzheimer's Disease Neuroimaging Initiative (ADNI), and this analysis of the ADNI cohort demonstrates clearly the value added of incorporating interval estimation beyond only point estimation when relating SNPs to brain imaging endophenotypes.
  • We investigate the choice of tuning parameters for a Bayesian multi-level group lasso model developed for the joint analysis of neuroimaging and genetic data. The regression model we consider relates multivariate phenotypes consisting of brain summary measures (volumetric and cortical thickness values) to single nucleotide polymorphism (SNPs) data and imposes penalization at two nested levels, the first corresponding to genes and the second corresponding to SNPs. Associated with each level in the penalty is a tuning parameter which corresponds to a hyperparameter in the hierarchical Bayesian formulation. Following previous work on Bayesian lassos we consider the estimation of tuning parameters through either hierarchical Bayes based on hyperpriors and Gibbs sampling or through empirical Bayes based on maximizing the marginal likelihood using a Monte Carlo EM algorithm. For the specific model under consideration we find that these approaches can lead to severe overshrinkage of the regression parameter estimates in the high-dimensional setting or when the genetic effects are weak. We demonstrate these problems through simulation examples and study an approximation to the marginal likelihood which sheds light on the cause of this problem. We then suggest an alternative approach based on the widely applicable information criterion (WAIC), an asymptotic approximation to leave-one-out cross-validation that can be computed conveniently within an MCMC framework.
  • Brain decoding involves the determination of a subject's cognitive state or an associated stimulus from functional neuroimaging data measuring brain activity. In this setting the cognitive state is typically characterized by an element of a finite set, and the neuroimaging data comprise voluminous amounts of spatiotemporal data measuring some aspect of the neural signal. The associated statistical problem is one of classification from high-dimensional data. We explore the use of functional principal component analysis, mutual information networks, and persistent homology for examining the data through exploratory analysis and for constructing features characterizing the neural signal for brain decoding. We review each approach from this perspective, and we incorporate the features into a classifier based on symmetric multinomial logistic regression with elastic net regularization. The approaches are illustrated in an application where the task is to infer, from brain activity measured with magnetoencephalography (MEG), the type of video stimulus shown to a subject.