• High throughput screening of compounds (chemicals) is an essential part of drug discovery [7], involving thousands to millions of compounds, with the purpose of identifying candidate hits. Most statistical tools, including the industry standard B-score method, work on individual compound plates and do not exploit cross-plate correlation or statistical strength among plates. We present a new statistical framework for high throughput screening of compounds based on Bayesian nonparametric modeling. The proposed approach is able to identify candidate hits from multiple plates simultaneously, sharing statistical strength among plates and providing more robust estimates of compound activity. It can flexibly accommodate arbitrary distributions of compound activities and is applicable to any plate geometry. The algorithm provides a principled statistical approach for hit identification and false discovery rate control. Experiments demonstrate significant improvements in hit identification sensitivity and specificity over the B-score method, which is highly sensitive to threshold choice. The framework is implemented as an efficient R extension package BHTSpack and is suitable for large scale data sets.
  • In vaccine development, the temporal profiles of relative abundance of subtypes of immune cells (T-cells) is key to understanding vaccine efficacy. Complex and expensive experimental studies generate very sparse time series data on this immune response. Fitting multi-parameter dynamic models of the immune response dynamics-- central to evaluating mechanisms underlying vaccine efficacy-- is challenged by data sparsity. The research reported here addresses this challenge. For HIV/SIV vaccine studies in macaques, we: (a) introduce novel dynamic models of progression of cellular populations over time with relevant, time-delayed components reflecting the vaccine response; (b) define an effective Bayesian model fitting strategy that couples Markov chain Monte Carlo (MCMC) with Approximate Bayesian Computation (ABC)-- building on the complementary strengths of the two approaches, neither of which is effective alone; (c) explore questions of information content in the sparse time series for each of the model parameters, linking into experimental design and model simplification for future experiments; and (d) develop, apply and compare the analysis with samples from a recent HIV/SIV experiment, with novel insights and conclusions about the progressive response to the vaccine, and how this varies across subjects.