• We present a continuous model for structural brain connectivity based on the Poisson point process. The model treats each streamline curve in a tractography as an observed event in connectome space, here a product space of cortical white matter boundaries. We approximate the model parameter via kernel density estimation. To deal with the heavy computational burden, we develop a fast parameter estimation method by pre-computing associated Legendre products of the data, leveraging properties of the spherical heat kernel. We show how our approach can be used to assess the quality of cortical parcellations with respect to connectivty. We further present empirical results that suggest the discrete connectomes derived from our model have substantially higher test-retest reliability compared to standard methods.
  • This paper considers the problem of brain disease classification based on connectome data. A connectome is a network representation of a human brain. The typical connectome classification problem is very challenging because of the small sample size and high dimensionality of the data. We propose to use simultaneous approximate diagonalization of adjacency matrices in order to compute their eigenstructures in more stable way. The obtained approximate eigenvalues are further used as features for classification. The proposed approach is demonstrated to be efficient for detection of Alzheimer's disease, outperforming simple baselines and competing with state-of-the-art approaches to brain disease classification.
  • Big data initiatives such as the Enhancing NeuroImaging Genetics through Meta-Analysis consortium (ENIGMA), combine data collected by independent studies worldwide to achieve more accurate estimates of effect sizes and more reliable and reproducible outcomes. Such efforts require harmonized analyses protocols to consistently extract phenotypes. Even so, challenges include wide variability of fMRI protocols and scanner platforms; this leads to site-to-site variance in quality, resolution and temporal signal-to-noise ratio (tSNR). An effective harmonization should provide optimal measures for data of different qualities. We developed a multi-site rsfMRI analysis pipeline to allow research groups around the world to process rsfMRI scans in a harmonized way, to extract consistent and quantitative measurements of connectivity and to perform coordinated statistical tests. We used the single-modality ENIGMA rsfMRI pipeline based on model-free Marchenko-Pastor PCA based denoising to verify and replicate findings of significant heritability of measures from resting state networks. We analyzed two independent cohorts, GOBS (Genetics of Brain Structure) and HCP (the Human Connectome Project), which collected data using conventional and connectomics oriented fMRI protocols. We used seed-based connectivity and dual-regression approaches to show that rsfMRI signal is consistently heritable across twenty major functional network measures. Heritability values of 20-40% were observed across both cohorts.
  • As very large studies of complex neuroimaging phenotypes become more common, human quality assessment of MRI-derived data remains one of the last major bottlenecks. Few attempts have so far been made to address this issue with machine learning. In this work, we optimize predictive models of quality for meshes representing deep brain structure shapes. We use standard vertex-wise and global shape features computed homologously across 19 cohorts and over 7500 human-rated subjects, training kernelized Support Vector Machine and Gradient Boosted Decision Trees classifiers to detect meshes of failing quality. Our models generalize across datasets and diseases, reducing human workload by 30-70\%, or equivalently hundreds of human rater hours for datasets of comparable size, with recall rates approaching inter-rater reliability.
  • Understanding the modularity of fMRI-derived brain networks or connectomes can inform the study of brain function organization. However, fMRI connectomes additionally involve negative edges, which are not rigorously accounted for by existing approaches to modularity that either ignores or arbitrarily weight these connections. Furthermore, most Q maximization-based modularity algorithms yield variable results with suboptimal reproducibility. Here we present an alternative, reproducible approach that exploits how frequent the BOLD-signal correlation between two nodes is negative. We validated this novel probability-based modularity approach on two independent publicly-available resting-state connectome dataset (the Human Connectome Project and the 1000 Functional Connectomes) and demonstrated that negative correlations alone are sufficient in understanding resting-state modularity. In fact, this approach a) permits a dual formulation, leading to equivalent solutions regardless of whether one considers positive or negative edges; b) is theoretically linked to the Ising model defined on the connectome, thus yielding modularity result that maximizes data likelihood. We additionally were able to detect sex differences in modularity that the most widely utilized methods did not. Results confirmed the superiority of our approach in that: a) correlations with the highest probability of being negative are consistently placed between modules, b) due to the equivalent dual forms, no arbitrary weighting factor is required to balance the influence between negative and positive correlations, unlike existing Q maximization-based modularity approaches. As datasets like HCP become widely available for analysis by the neuroscience community at large, appropriate computational tools to understand the neurobiological information of negative edges in fMRI connectomes are increasingly important.
  • Large-scale collaborative analysis of brain imaging data, in psychiatry and neu-rology, offers a new source of statistical power to discover features that boost ac-curacy in disease classification, differential diagnosis, and outcome prediction. However, due to data privacy regulations or limited accessibility to large datasets across the world, it is challenging to efficiently integrate distributed information. Here we propose a novel classification framework through multi-site weighted LASSO: each site performs an iterative weighted LASSO for feature selection separately. Within each iteration, the classification result and the selected features are collected to update the weighting parameters for each feature. This new weight is used to guide the LASSO process at the next iteration. Only the fea-tures that help to improve the classification accuracy are preserved. In tests on da-ta from five sites (299 patients with major depressive disorder (MDD) and 258 normal controls), our method boosted classification accuracy for MDD by 4.9% on average. This result shows the potential of the proposed new strategy as an ef-fective and practical collaborative platform for machine learning on large scale distributed imaging and biobank data.
  • Genome-wide association studies (GWAS) have achieved great success in the genetic study of Alzheimer's disease (AD). Collaborative imaging genetics studies across different research institutions show the effectiveness of detecting genetic risk factors. However, the high dimensionality of GWAS data poses significant challenges in detecting risk SNPs for AD. Selecting relevant features is crucial in predicting the response variable. In this study, we propose a novel Distributed Feature Selection Framework (DFSF) to conduct the large-scale imaging genetics studies across multiple institutions. To speed up the learning process, we propose a family of distributed group Lasso screening rules to identify irrelevant features and remove them from the optimization. Then we select the relevant group features by performing the group Lasso feature selection process in a sequence of parameters. Finally, we employ the stability selection to rank the top risk SNPs that might help detect the early stage of AD. To the best of our knowledge, this is the first distributed feature selection model integrated with group Lasso feature selection as well as detecting the risk genetic factors across multiple research institutions system. Empirical studies are conducted on 809 subjects with 5.9 million SNPs which are distributed across several individual institutions, demonstrating the efficiency and effectiveness of the proposed method.
  • One of the primary objectives of human brain mapping is the division of the cortical surface into functionally distinct regions, i.e. parcellation. While it is generally agreed that at macro-scale different regions of the cortex have different functions, the exact number and configuration of these regions is not known. Methods for the discovery of these regions are thus important, particularly as the volume of available information grows. Towards this end, we present a parcellation method based on a Bayesian non-parametric mixture model of cortical connectivity.
  • In the present work we demonstrate the use of a parcellation free connectivity model based on Poisson point processes. This model produces for each subject a continuous bivariate intensity function that represents for every possible pair of points the relative rate at which we observe tracts terminating at those points. We fit this model to explore degree sequence equivalents for spatial continuum graphs, and to investigate the local differences between estimated intensity functions for two different tractography methods. This is a companion paper to Moyer et al. (2016), where the model was originally defined.
  • Genome-wide association studies (GWAS) offer new opportunities to identify genetic risk factors for Alzheimer's disease (AD). Recently, collaborative efforts across different institutions emerged that enhance the power of many existing techniques on individual institution data. However, a major barrier to collaborative studies of GWAS is that many institutions need to preserve individual data privacy. To address this challenge, we propose a novel distributed framework, termed Local Query Model (LQM) to detect risk SNPs for AD across multiple research institutions. To accelerate the learning process, we propose a Distributed Enhanced Dual Polytope Projection (D-EDPP) screening rule to identify irrelevant features and remove them from the optimization. To the best of our knowledge, this is the first successful run of the computationally intensive model selection procedure to learn a consistent model across different institutions without compromising their privacy while ranking the SNPs that may collectively affect AD. Empirical studies are conducted on 809 subjects with 5.9 million SNP features which are distributed across three individual institutions. D-EDPP achieved a 66-fold speed-up by effectively identifying irrelevant features.
  • There is growing interest in understanding how the structural interconnections among brain regions change with the occurrence of neurological diseases. Diffusion weighted MRI imaging has allowed researchers to non-invasively estimate a network of structural cortical connections made by white matter tracts, but current statistical methods for relating such networks to the presence or absence of a disease cannot exploit this rich network information. Standard practice considers each edge independently or summarizes the network with a few simple features. We enable dramatic gains in biological insight via a novel unifying methodology for inference on brain network variations associated to the occurrence of neurological diseases. The key of this approach is to define a probabilistic generative mechanism directly on the space of network configurations via dependent mixtures of low-rank factorizations, which efficiently exploit network information and allow the probability mass function for the brain network-valued random variable to vary flexibly across the group of patients characterized by a specific neurological disease and the one comprising age-matched cognitively healthy individuals.
  • We present a new method for the detection of gene pathways associated with a multivariate quantitative trait, and use it to identify causal pathways associated with an imaging endophenotype characteristic of longitudinal structural change in the brains of patients with Alzheimer's disease (AD). Our method, known as pathways sparse reduced-rank regression (PsRRR), uses group lasso penalised regression to jointly model the effects of genome-wide single nucleotide polymorphisms (SNPs), grouped into functional pathways using prior knowledge of gene-gene interactions. Pathways are ranked in order of importance using a resampling strategy that exploits finite sample variability. Our application study uses whole genome scans and MR images from 464 subjects in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. 66,182 SNPs are mapped to 185 gene pathways from the KEGG pathways database. Voxel-wise imaging signatures characteristic of AD are obtained by analysing 3D patterns of structural change at 6, 12 and 24 months relative to baseline. High-ranking, AD endophenotype-associated pathways in our study include those describing chemokine, Jak-stat and insulin signalling pathways, and tight junction interactions. All of these have been previously implicated in AD biology. In a secondary analysis, we investigate SNPs and genes that may be driving pathway selection, and identify a number of previously validated AD genes including CR1, APOE and TOMM40.