• Brain function is organized in coordinated modes of spatio-temporal activity (functional networks) exhibiting an intrinsic baseline structure with variations under different experimental conditions. Existing approaches for uncovering such network structures typically do not explicitly model shared and differential patterns across networks, thus potentially reducing the detection power. We develop an integrative modeling approach for jointly modeling multiple brain networks across experimental conditions. The proposed Bayesian Joint Network Learning approach develops flexible priors on the edge probabilities involving a common intrinsic baseline structure and differential effects specific to individual networks. Conditional on these edge probabilities, connection strengths are modeled under a Bayesian spike and slab prior on the off-diagonal elements of the inverse covariance matrix. The model is fit under a posterior computation scheme based on Markov chain Monte Carlo. Numerical simulations illustrate that the proposed joint modeling approach has increased power to detect true differential edges while providing adequate control on false positives and achieving greater accuracy in the estimation of edge strengths compared to existing methods. An application of the method to fMRI Stroop task data provides unique insights into brain network alterations between cognitive conditions which existing graphical modeling techniques failed to reveal.
  • Recently, there has been increased interest in fusing multimodal imaging to better understand brain organization. Specifically, accounting for knowledge of anatomical pathways connecting brain regions should lead to desirable outcomes such as increased accuracy in functional brain network estimates and greater reproducibility of topological features across scanning sessions. Despite the clear merits, major challenges persist in integrative analyses including an incomplete understanding of the structure-function relationship and inaccuracies in mapping anatomical structures due to deficiencies in existing imaging technology. Clearly advanced network modeling tools are needed to appropriately incorporate anatomical structure in constructing brain functional networks. We propose a hierarchical Bayesian Gaussian graphical modeling approach that estimates the functional networks via sparse precision matrices whose degree of edge-specific shrinkage is informed by anatomical structure and an independent baseline component. The approach flexibly identifies functional connections supported by structural connectivity knowledge. This enables robust brain network estimation even in the presence of mis-specified anatomical knowledge, while accommodating heterogeneity in the structure-function relationship. We implement the approach via an efficient optimization algorithm yielding maximum a posteriori estimates. Extensive numerical studies reveal the clear advantages of our approach over competing methods in accurately estimating brain functional connectivity, even when the anatomical knowledge is mis-specified. An application of the approach to the Philadelphia Neurodevelopmental Cohort (PNC) study reveals gender based connectivity differences across multiple age groups, and higher reproducibility in the estimation of network metrics compared to alternative methods.
  • Variable selection for structured covariates lying on an underlying known graph is a problem motivated by practical applications, and has been a topic of increasing interest. However, most of the existing methods may not be scalable to high dimensional settings involving tens of thousands of variables lying on known pathways such as the case in genomics studies. We propose an adaptive Bayesian shrinkage approach which incorporates prior network information by smoothing the shrinkage parameters for connected variables in the graph, so that the corresponding coefficients have a similar degree of shrinkage. We fit our model via a computationally efficient expectation maximization algorithm which scalable to high dimensional settings (p~100,000). Theoretical properties for fixed as well as increasing dimensions are established, even when the number of variables increases faster than the sample size. We demonstrate the advantages of our approach in terms of variable selection, prediction, and computational scalability via a simulation study, and apply the method to a cancer genomics study.
  • Significant advances in biotechnology have allowed for simultaneous measurement of molecular data points across multiple genomic and transcriptomic levels from a single tumor/cancer sample. This has motivated systematic approaches to integrate multi-dimensional structured datasets since cancer development and progression is driven by numerous co-ordinated molecular alterations and the interactions between them. We propose a novel two-step Bayesian approach that combines a variable selection framework with integrative structure learning between multiple sources of data. The structure learning in the first step is accomplished through novel joint graphical models for heterogeneous (mixed scale) data allowing for flexible incorporation of prior knowledge. This structure learning subsequently informs the variable selection in the second step to identify groups of molecular features within and across platforms associated with outcomes of cancer progression. The variable selection strategy adjusts for collinearity and multiplicity, and also has theoretical justifications. We evaluate our methods through simulations and apply them to a motivating genomic (DNA copy number and methylation) and transcriptomic (mRNA expression) data for assessing important markers associated with Glioblastoma progression.
  • There has been an intense development of Bayes graphical model estimation approaches over the past decade - however, most of the existing methods are restricted to moderate dimensions. We propose a novel approach suitable for high dimensional settings, by decoupling model fitting and covariance selection. First, a full model based on a complete graph is fit under novel class of continuous shrinkage priors on the precision matrix elements, which induces shrinkage under an equivalence with Cholesky-based regularization while enabling conjugate updates of entire precision matrices. Subsequently, we propose a post-fitting graphical model estimation step which proceeds using penalized joint credible regions to perform neighborhood selection sequentially for each node. The posterior computation proceeds using straightforward fully Gibbs sampling, and the approach is scalable to high dimensions. The proposed approach is shown to be asymptotically consistent in estimating the graph structure for fixed $p$ when the truth is a Gaussian graphical model. Simulations show that our approach compares favorably with Bayesian competitors both in terms of graphical model estimation and computational efficiency. We apply our methods to high dimensional gene expression and microRNA datasets in cancer genomics.
  • Although discrete mixture modeling has formed the backbone of the literature on Bayesian density estimation, there are some well known disadvantages. We propose an alternative class of priors based on random nonlinear functions of a uniform latent variable with an additive residual. The induced prior for the density is shown to have desirable properties including ease of centering on an initial guess for the density, large support, posterior consistency and straightforward computation via Gibbs sampling. Some advantages over discrete mixtures, such as Dirichlet process mixtures of Gaussian kernels, are discussed and illustrated via simulations and an epidemiology application.
  • There is a rich literature proposing methods and establishing asymptotic properties of Bayesian variable selection methods for parametric models, with a particular focus on the normal linear regression model and an increasing emphasis on settings in which the number of candidate predictors ($p$) diverges with sample size ($n$). Our focus is on generalizing methods and asymptotic theory established for mixtures of $g$-priors to semiparametric linear regression models having unknown residual densities. Using a Dirichlet process location mixture for the residual density, we propose a semiparametric $g$-prior which incorporates an unknown matrix of cluster allocation indicators. For this class of priors, posterior computation can proceed via a straightforward stochastic search variable selection algorithm. In addition, Bayes factor and variable selection consistency is shown to result under various cases including proper and improper priors on $g$ and $p>n$, with the models under comparison restricted to have model dimensions diverging at a rate less than $n$.