• Studying the neurological, genetic and evolutionary basis of human vocal communication mechanisms using animal vocalization models is an important field of neuroscience. The data sets typically comprise structured sequences of syllables or `songs' produced by animals from different genotypes under different social contexts. We develop a novel Bayesian semiparametric framework for inference in such data sets. Our approach is built on a novel class of mixed effects Markov transition models for the songs that accommodates exogenous influences of genotype and context as well as animal-specific heterogeneity. We design efficient Markov chain Monte Carlo algorithms for posterior computation. Crucial advantages of the proposed approach include its ability to provide insights into key scientific queries related to global and local influences of the exogenous predictors on the transition dynamics via automated tests of hypotheses. The methodology is illustrated using simulation experiments and the aforementioned motivating application in neuroscience.
  • We consider the problem of multivariate density deconvolution when the interest lies in estimating the distribution of a vector-valued random variable but precise measurements of the variable of interest are not available, observations being contaminated with additive measurement errors. The existing sparse literature on the problem assumes the density of the measurement errors to be completely known. We propose robust Bayesian semiparametric multivariate deconvolution approaches when the measurement error density is not known but replicated proxies are available for each unobserved value of the random vector. Additionally, we allow the variability of the measurement errors to depend on the associated unobserved value of the vector of interest through unknown relationships which also automatically includes the case of multivariate multiplicative measurement errors. Basic properties of finite mixture models, multivariate normal kernels and exchangeable priors are exploited in many novel ways to meet the modeling and computational challenges. Theoretical results that show the flexibility of the proposed methods are provided. We illustrate the efficiency of the proposed methods in recovering the true density of interest through simulation experiments. The methodology is applied to estimate the joint consumption pattern of different dietary components from contaminated 24 hour recalls.
  • We consider the problem of flexible modeling of higher order Markov chains when an upper bound on the order of the chain is known but the true order and nature of the serial dependence are unknown. We propose Bayesian nonparametric methodology based on conditional tensor factorizations, which can characterize any transition probability with a specified maximal order. The methodology selects the important lags and captures higher order interactions among the lags, while also facilitating calculation of Bayes factors for a variety of hypotheses of interest. We design efficient Markov chain Monte Carlo algorithms for posterior computation, allowing for uncertainty in the set of important lags to be included and in the nature and order of the serial dependence. The methods are illustrated using simulation experiments and real world applications.
  • We consider the problem of estimating high-dimensional covariance matrices of a particular structure, which is a summation of low rank and sparse matrices. This covariance structure has a wide range of applications including factor analysis and random effects models. We propose a Bayesian method of estimating the covariance matrices by representing the covariance model in the form of a factor model with unknown number of latent factors. We introduce binary indicators for factor selection and rank estimation for the low rank component combined with a Bayesian lasso method for the sparse component estimation. Simulation studies show that our method can recover the rank as well as the sparsity of the two components respectively. We further extend our method to a graphical factor model where the graphical model of the residuals as well as selecting the number of factors is of interest. We employ a hyper-inverse Wishart prior for modeling decomposable graphs of the residuals, and a Bayesian graphical lasso selection method for unrestricted graphs. We show through simulations that the extended models can recover both the number of latent factors and the graphical model of the residuals successfully when the sample size is sufficient relative to the dimension.
  • Bayesian density deconvolution using nonparametric prior distributions is a useful alternative to the frequentist kernel based deconvolution estimators due to its potentially wide range of applicability, straightforward uncertainty quantification and generalizability to more sophisticated models. This article is the first substantive effort to theoretically quantify the behavior of the posterior in this recent line of research. In particular, assuming a known supersmooth error density, a Dirichlet process mixture of Normals on the true density leads to a posterior convergence rate same as the minimax rate $(\log n)^{-\eta/\beta}$ adaptively over the smoothness $\eta$ of an appropriate H\"{o}lder space of densities, where $\beta$ is the degree of smoothness of the error distribution. Our main contribution is achieving adaptive minimax rates with respect to the $L_p$ norm for $2 \leq p \leq \infty$ under mild regularity conditions on the true density. En route, we develop tight concentration bounds for a class of kernel based deconvolution estimators which might be of independent interest.
  • In this article a flexible Bayesian non-parametric model is proposed for non-homogeneous hidden Markov models. The model is developed through the amalgamation of the ideas of hidden Markov models and predictor dependent stick-breaking processes. Computation is carried out using auxiliary variable representation of the model which enable us to perform exact MCMC sampling from the posterior. Furthermore, the model is extended to the situation when the predictors can simultaneously in influence the transition dynamics of the hidden states as well as the emission distribution. Estimates of few steps ahead conditional predictive distributions of the response have been used as performance diagnostics for these models. The proposed methodology is illustrated through simulation experiments as well as analysis of a real data set concerned with the prediction of rainfall induced malaria epidemics.