
Bayesian sparse factor models have proven useful for characterizing
dependence in multivariate data, but scaling computation to large numbers of
samples and dimensions is problematic. We propose expandable factor analysis
for scalable inference in factor models when the number of factors is unknown.
The method relies on a continuous shrinkage prior for efficient maximum a
posteriori estimation of a lowrank and sparse loadings matrix. The structure
of the prior leads to an estimation algorithm that accommodates uncertainty in
the number of factors. We propose an information criterion to select the
hyperparameters of the prior. Expandable factor analysis has better false
discovery rates and true positive rates than its competitors across diverse
simulations. We apply the proposed approach to a gene expression study of aging
in mice, illustrating superior results relative to four competing methods.

Domain adaptation addresses the problem created when training data is
generated by a socalled source distribution, but test data is generated by a
significantly different target distribution. In this work, we present
approximate label matching (ALM), a new unsupervised domain adaptation
technique that creates and leverages a rough labeling on the test samples, then
uses these noisy labels to learn a transformation that aligns the source and
target samples. We show that the transformation estimated by ALM has favorable
properties compared to transformations estimated by other methods, which do not
use any kind of target labeling. Our model is regularized by requiring that a
classifier trained to discriminate source from transformed target samples
cannot distinguish between the two. We experiment with ALM on simulated and
real data, and show that it outperforms techniques commonly used in the field.

We present a general framework, the coupled compound Poisson factorization
(CCPF), to capture the missingdata mechanism in extremely sparse data sets by
coupling a hierarchical Poisson factorization with an arbitrary datagenerating
model. We derive a stochastic variational inference algorithm for the resulting
model and, as examples of our framework, implement three different
datagenerating modelsa mixture model, linear regression, and factor
analysisto robustly model nonrandom missing data in the context of
clustering, prediction, and matrix factorization. In all three cases, we test
our framework against models that ignore the missingdata mechanism on large
scale studies with nonrandom missing data, and we show that explicitly
modeling the missingdata mechanism substantially improves the quality of the
results, as measured using data log likelihood on a heldout test set.

Modelbased collaborative filtering analyzes useritem interactions to infer
latent factors that represent user preferences and item characteristics in
order to predict future interactions. Most collaborative filtering algorithms
assume that these latent factors are static, although it has been shown that
user preferences and item perceptions drift over time. In this paper, we
propose a conjugate and numerically stable dynamic matrix factorization (DCPF)
based on compound Poisson matrix factorization that models the smoothly
drifting latent factors using GammaMarkov chains. We propose a numerically
stable Gamma chain construction, and then present a stochastic variational
inference approach to estimate the parameters of our model. We apply our model
to timestamped ratings data sets: Netflix, Yelp, and Last.fm, where DCPF
achieves a higher predictive accuracy than stateoftheart static and dynamic
factorization models.

Nonnegative matrix factorization models based on a hierarchical
GammaPoisson structure capture user and item behavior effectively in extremely
sparse data sets, making them the ideal choice for collaborative filtering
applications. Hierarchical Poisson factorization (HPF) in particular has proved
successful for scalable recommendation systems with extreme sparsity. HPF,
however, suffers from a tight coupling of sparsity model (absence of a rating)
and response model (the value of the rating), which limits the expressiveness
of the latter. Here, we introduce hierarchical compound Poisson factorization
(HCPF) that has the favorable GammaPoisson structure and scalability of HPF to
highdimensional extremely sparse matrices. More importantly, HCPF decouples
the sparsity model from the response model, allowing us to choose the most
suitable distribution for the response. HCPF can capture binary, nonnegative
discrete, nonnegative continuous, and zeroinflated continuous responses. We
compare HCPF with HPF on nine discrete and three continuous data sets and
conclude that HCPF captures the relationship between sparsity and response
better than HPF.

We develop a generalized method of moments (GMM) approach for fast parameter
estimation in a new class of Dirichlet latent variable models with mixed data
types. Parameter estimation via GMM has been demonstrated to have computational
and statistical advantages over alternative methods, such as expectation
maximization, variational inference, and Markov chain Monte Carlo. The key
computational advan tage of our method (MELD) is that parameter estimation
does not require instantiation of the latent variables. Moreover, a
representational advantage of the GMM approach is that the behavior of the
model is agnostic to distributional assumptions of the observations. We derive
population moment conditions after marginalizing out the samplespecific
Dirichlet latent variables. The moment conditions only depend on component mean
parameters. We illustrate the utility of our approach on simulated data,
comparing results from MELD to alternative methods, and we show the promise of
our approach through the application of MELD to several data sets.

Identifying latent structure in large data matrices is essential for
exploring biological processes. Here, we consider recovering gene coexpression
networks from gene expression data, where each network encodes relationships
between genes that are locally coregulated by shared biological mechanisms. To
do this, we develop a Bayesian statistical model for biclustering to infer
subsets of coregulated genes whose covariation may be observed in only a
subset of the samples. Our biclustering method, BicMix, has desirable
properties, including allowing overcomplete representations of the data,
computational tractability, and jointly modeling unknown confounders and
biological signals. Compared with related biclustering methods, BicMix recovers
latent structure with higher precision across diverse simulation scenarios.
Further, we develop a method to recover gene coexpression networks from the
estimated sparse biclustering matrices. We apply BicMix to breast cancer gene
expression data and recover a gene coexpression network that is differential
across ER+ and ER samples.

Substantial research on structured sparsity has contributed to analysis of
many different applications. However, there have been few Bayesian procedures
among this work. Here, we develop a Bayesian model for structured sparsity that
uses a Gaussian process (GP) to share parameters of the sparsityinducing prior
in proportion to feature similarity as defined by an arbitrary positive
definite kernel. For linear regression, this sparsityinducing prior on
regression coefficients is a relaxation of the canonical spikeandslab prior
that flattens the mixture model into a scale mixture of normals. This prior
retains the explicit posterior probability on inclusion parametersnow with
GP probit prior distributionsbut enables tractable computation via
elliptical slice sampling for the latent Gaussian field. We motivate
development of this prior using the genomic application of association mapping,
or identifying genetic variants associated with a continuous trait. Our
Bayesian structured sparsity model produced sparse results with substantially
improved sensitivity and precision relative to comparable methods. Through
simulations, we show that three properties are key to this improvement: i)
modeling structure in the covariates, ii) significance testing using the
posterior probabilities of inclusion, and iii) model averaging. We present
results from applying this model to a large genomic dataset to demonstrate
computational tractability.