
Bulk gene expression experiments relied on aggregations of thousands of cells
to measure the average expression in an organism. Advances in microfluidic and
droplet sequencing now permit expression profiling in single cells. This study
of celltocell variation reveals that individual cells lack detectable
expression of transcripts that appear abundant on a population level, giving
rise to zeroinflated expression patterns. To infer gene coregulatory networks
from such data, we propose a multivariate Hurdle model. It is comprised of a
mixture of singular Gaussian distributions. We employ neighborhood selection
with the pseudolikelihood and a group lasso penalty to select and fit
undirected graphical models that capture conditional independences between
genes. The proposed method is more sensitive than existing approaches in
simulations, even under departures from our Hurdle model. The method is applied
to data for T follicular helper cells, and a highdimensional profile of mouse
dendritic cells. It infers network structure not revealed by other methods; or
in bulk data sets. An R implementation is available at
https://github.com/amcdavid/HurdleNormal .

We introduce a general framework for undirected graphical models. It
generalizes Gaussian graphical models to a wide range of continuous, discrete,
and combinations of different types of data. The models in the framework,
called exponential trace models, are amenable to estimation based on maximum
likelihood. We introduce a samplingbased approximation algorithm for computing
the maximum likelihood estimator, and we apply this pipeline to learn
simultaneous neural activities from spike data.

In highdimensional and/or nonparametric regression problems, regularization
(or penalization) is used to control model complexity and induce desired
structure. Each penalty has a weight parameter that indicates how strongly the
structure corresponding to that penalty should be enforced. Typically the
parameters are chosen to minimize the error on a separate validation set using
a simple grid search or a gradientfree optimization method. It is more
efficient to tune parameters if the gradient can be determined, but this is
often difficult for problems with nonsmooth penalty functions. Here we show
that for many penalized regression problems, the validation loss is actually
smooth almosteverywhere with respect to the penalty parameters. We can
therefore apply a modified gradient descent algorithm to tune parameters.
Through simulation studies on example regression problems, we find that
increasing the number of penalty parameters and tuning them using our method
can decrease the generalization error.

In the past few years, new technologies in the field of neuroscience have
made it possible to simultaneously image activity in large populations of
neurons at cellular resolution in behaving animals. In mid2016, a huge
repository of this socalled "calcium imaging" data was made
publiclyavailable. The availability of this largescale data resource opens
the door to a host of scientific questions, for which new statistical methods
must be developed.
In this paper, we consider the first step in the analysis of calcium imaging
data: namely, identifying the neurons in a calcium imaging video. We propose a
dictionary learning approach for this task. First, we perform image
segmentation to develop a dictionary containing a huge number of candidate
neurons. Next, we refine the dictionary using clustering. Finally, we apply the
dictionary in order to select neurons and estimate their corresponding activity
over time, using a sparse group lasso optimization problem. We apply our
proposal to three calcium imaging data sets.
Our proposed approach is implemented in the R package scalpel, which is
available on CRAN.

Confidence interval procedures used in low dimensional settings are often
inappropriate for high dimensional applications. When a large number of
parameters are estimated, marginal confidence intervals associated with the
most significant estimates have very low coverage rates: They are too small and
centered at biased estimates. The problem of forming confidence intervals in
high dimensional settings has previously been studied through the lens of
selection adjustment. In this framework, the goal is to control the proportion
of noncovering intervals formed for selected parameters.
In this paper we approach the problem by considering the relationship between
rank and coverage probability. Marginal confidence intervals have very low
coverage rates for significant parameters and high rates for parameters with
more boring estimates. Many selection adjusted intervals display the same
pattern. This connection motivates us to propose a new coverage criterion for
confidence intervals in multiple testing/covering problems  the rank
conditional coverage (RCC). This is the expected coverage rate of an interval
given the significance ranking for the associated estimator. We propose
interval construction via bootstrapping which produces small intervals and have
a rank conditional coverage close to the nominal level. These methods are
implemented in the R package rcc.

We consider the problem of nonparametric regression with a potentially large
number of covariates. We propose a convex, penalized estimation framework that
is particularly wellsuited for highdimensional sparse additive models. The
proposed approach combines appealing features of finite basis representation
and smoothing penalties for nonparametric estimation. In particular, in the
case of additive models, a finite basis representation provides a parsimonious
representation for fitted functions but is not adaptive when component
functions posses different levels of complexity. On the other hand, a smoothing
spline type penalty on the component functions is adaptive but does not offer a
parsimonious representation of the estimated function. The proposed approach
simultaneously achieves parsimony and adaptivity in a computationally efficient
framework. We demonstrate these properties through empirical studies on both
real and simulated datasets. We show that our estimator converges at the
minimax rate for functions within a hierarchical class. We further establish
minimax rates for a large class of sparse additive models. The proposed method
is implemented using an efficient algorithm that scales similarly to the Lasso
with the number of covariates and samples size.

Genomic phenotypes, such as DNA methylation and chromatin accessibility, can
be used to characterize the transcriptional and regulatory activity of DNA
within a cell. Recent technological advances have made it possible to measure
such phenotypes very densely. This density often results in spatial structure,
in the sense that measurements at nearby sites are very similar.
In this paper, we consider the task of comparing genomic phenotypes across
experimental conditions, cell types, or disease subgroups. We propose a new
method, Joint Adaptive Differential Estimation (JADE), which leverages the
spatial structure inherent to genomic phenotypes. JADE simultaneously estimates
smooth underlying group average genomic phenotype profiles, and detects regions
in which the average profile differs between groups. We evaluate JADE's
performance in several biologically plausible simulation settings. We also
consider an application to the detection of regions with differential
methylation between mature skeletal muscle cells, myotubes and myoblasts.

We consider the task of fitting a regression model involving interactions
among a potentially large set of covariates, in which we wish to enforce strong
heredity. We propose FAMILY, a very general framework for this task. Our
proposal is a generalization of several existing methods, such as VANISH
[Radchenko and James, 2010], hierNet [Bien et al., 2013], the allpairs lasso,
and the lasso using only main effects. It can be formulated as the solution to
a convex optimization problem, which we solve using an efficient alternating
directions method of multipliers (ADMM) algorithm. This algorithm has
guaranteed convergence to the global optimum, can be easily specialized to any
convex penalty function of interest, and allows for a straightforward extension
to the setting of generalized linear models. We derive an unbiased estimator of
the degrees of freedom of FAMILY, and explore its performance in a simulation
study and on an HIV sequence data set.

We consider the testing of all pairwise interactions in a twoclass problem
with many features. We devise a hierarchical testing framework that considers
an interaction only when one or more of its constituent features has a nonzero
main effect. The test is based on a convex optimization framework that
seamlessly considers main effects and interactions together. We show  both in
simulation and on a genomic data set from the SAPPHIRe study  a potential gain
in power and interpretability over a standard (nonhierarchical) interaction
test.

We consider largescale studies in which it is of interest to test a very
large number of hypotheses, and then to estimate the effect sizes corresponding
to the rejected hypotheses. For instance, this setting arises in the analysis
of gene expression or DNA sequencing data. However, naive estimates of the
effect sizes suffer from selection bias, i.e., some of the largest naive
estimates are large due to chance alone. Many authors have proposed methods to
reduce the effects of selection bias under the assumption that the naive
estimates of the effect sizes are independent. Unfortunately, when the effect
size estimates are dependent, these existing techniques can have very poor
performance, and in practice there will often be dependence. We propose an
estimator that adjusts for selection bias under a recentlyproposed frequentist
framework, without the independence assumption. We study some properties of the
proposed estimator, and illustrate that it outperforms past proposals in a
simulation study and on two gene expression data sets.

We consider the problem of predicting an outcome variable using $p$
covariates that are measured on $n$ independent observations, in the setting in
which flexible and interpretable fits are desirable. We propose the fused lasso
additive model (FLAM), in which each additive function is estimated to be
piecewise constant with a small number of adaptivelychosen knots. FLAM is the
solution to a convex optimization problem, for which a simple algorithm with
guaranteed convergence to the global optimum is provided. FLAM is shown to be
consistent in high dimensions, and an unbiased estimator of its degrees of
freedom is proposed. We evaluate the performance of FLAM in a simulation study
and on two data sets.

The proposal of Reshef et al. (2011) is an interesting new approach for
discovering nonlinear dependencies among pairs of measurements in exploratory
data mining. However, it has a potentially serious drawback. The authors laud
the fact that MIC has no preference for some alternatives over others, but as
the authors know, there is no free lunch in Statistics: tests which strive to
have high power against all alternatives can have low power in many important
situations. To investigate this, we ran simulations to compare the power of MIC
to that of standard Pearson correlation and distance correlation (dcor). We
simulated pairs of variables with different relationships (most of which were
considered by the Reshef et. al.), but with varying levels of noise added. To
determine proper cutoffs for testing the independence hypothesis, we simulated
independent data with the appropriate marginals. As one can see from the
Figure, MIC has lower power than dcor, in every case except the somewhat
pathological highfrequency sine wave. MIC is sometimes less powerful than
Pearson correlation as well, the linear case being particularly worrisome.

In this paper we purpose a blockwise descent algorithm for grouppenalized
multiresponse regression. Using a quasinewton framework we extend this to
grouppenalized multinomial regression. We give a publicly available
implementation for these in R, and compare the speed of this algorithm to a
competing algorithm  we show that our implementation is an order of
magnitude faster than its competitor, and can solve geneexpressionsized
problems in real time.

With recent advances in high throughput technology, researchers often find
themselves running a large number of hypothesis tests (thousands+) and esti
mating a large number of effectsizes. Generally there is particular interest
in those effects estimated to be most extreme. Unfortunately naive estimates of
these effectsizes (even after potentially accounting for multiplicity in a
testing procedure) can be severely biased. In this manuscript we explore this
bias from a frequentist perspective: we give a formal definition, and show that
an oracle estimator using this bias dominates the naive maximum likelihood
estimate. We give a resampling estimator to approximate this oracle, and show
that it works well on simulated data. We also connect this to ideas in
empirical Bayes.

To date, testing interactions in high dimensions has been a challenging task.
Existing methods often have issues with sensitivity to modeling assumptions and
heavily asymptotic nominal pvalues. To help alleviate these issues, we propose
a permutationbased method for testing marginal interactions with a binary
response. Our method searches for pairwise correlations which differ between
classes. In this manuscript, we compare our method on real and simulated data
to the standard approach of running many pairwise logistic models. On simulated
data our method finds more significant interactions at a lower false discovery
rate (especially in the presence of main effects). On real genomic data,
although there is no gold standard, our method finds apparent signal and tells
a believable story, while logistic regression does not. We also give asymptotic
consistency results under not too restrictive assumptions.

Linear and Quadratic Discriminant analysis (LDA/QDA) are common tools for
classification problems. For these methods we assume observations are normally
distributed within group. We estimate a mean and covariance matrix for each
group and classify using Bayes theorem. With LDA, we estimate a single, pooled
covariance matrix, while for QDA we estimate a separate covariance matrix for
each group. Rarely do we believe in a homogeneous covariance structure between
groups, but often there is insufficient data to separately estimate covariance
matrices. We propose L1 PDA, a regularized model which adaptively pools
elements of the precision matrices. Adaptively pooling these matrices decreases
the variance of our estimates (as in LDA), without overly biasing them. In this
paper, we propose and discuss this method, give an efficient algorithm to fit
it for moderate sized problems, and show its efficacy on real and simulated
datasets.

We consider rules for discarding predictors in lasso regression and related
problems, for computational efficiency. El Ghaoui et al (2010) propose "SAFE"
rules that guarantee that a coefficient will be zero in the solution, based on
the inner products of each predictor with the outcome. In this paper we propose
strong rules that are not foolproof but rarely fail in practice. These can be
complemented with simple checks of the Karush KuhnTucker (KKT) conditions to
provide safe rules that offer substantial speed and space savings in a variety
of statistical convex optimization problems.