
A multivariate quantile regression model with a factor structure is proposed
to mine data with many responses of interest. The factor structure is allowed
to vary with the quantile levels, which makes our framework more flexible than
the classical factor models. The model is estimated with the nuclear norm
regularization in order to accommodate the high dimensionality of data, but the
incurred optimization problem can only be efficiently solved in an approximate
manner by offtheshelf optimization methods. Such a scenario is often seen
when the empirical risk is nonsmooth or the numerical procedure involves
expensive subroutines such as singular value decomposition. To ensure that the
approximate estimator accurately estimates the model, sufficient conditions on
the optimization error and nonasymptotic error bounds are established to
characterize the risk of the proposed estimator. A numerical procedure that
provably achieves small approximate error is proposed. The merits of our model
and the proposed numerical procedures are demonstrated through Monte Carlo
experiments and an application to finance involving a large pool of asset
returns.

In this article, we develop methods for estimating a low rank tensor from
noisy observations on a subset of its entries to achieve both statistical and
computational efficiencies. There have been a lot of recent interests in this
problem of noisy tensor completion. Much of the attention has been focused on
the fundamental computational challenges often associated with problems
involving higher order tensors, yet very little is known about their
statistical performance. To fill in this void, in this article, we characterize
the fundamental statistical limits of noisy tensor completion by establishing
minimax optimal rates of convergence for estimating a $k$th order low rank
tensor under the general $\ell_p$ ($1\le p\le 2$) norm which suggest
significant room for improvement over the existing approaches. Furthermore, we
propose a polynomialtime computable estimating procedure based upon power
iteration and a secondorder spectral initialization that achieves the optimal
rates of convergence. Our method is fairly easy to implement and numerical
experiments are presented to further demonstrate the practical merits of our
estimator.

Image demosaicing  one of the most important early stages in digital camera
pipelines  addressed the problem of reconstructing a fullresolution image
from socalled colorfilterarrays. Despite tremendous progress made in the
pase decade, a fundamental issue that remains to be addressed is how to assure
the visual quality of reconstructed images especially in the presence of noise
corruption. Inspired by recent advances in generative adversarial networks
(GAN), we present a novel deep learning approach toward joint demosaicing and
denoising (JDD) with perceptual optimization in order to ensure the visual
quality of reconstructed images. The key contributions of this work include: 1)
we have developed a GANbased approach toward image demosacing in which a
discriminator network with both perceptual and adversarial loss functions are
used for quality assurance; 2) we propose to optimize the perceptual quality of
reconstructed images by the proposed GAN in an endtoend manner. Such
endtoend optimization of GAN is particularly effective for jointly exploiting
the gain brought by each modular component (e.g., residue learning in the
generative network and perceptual loss in the discriminator network). Our
extensive experimental results have shown convincingly improved performance
over existing stateoftheart methods in terms of both subjective and
objective quality metrics with a comparable computational cost.

In this paper, we study the effect of dependence on detecting sparse signals.
In particular, we focus on global testing against sparse alternatives for the
means of binary outcomes following an Ising model, and establish how the
interplay between the strength and sparsity of a signal determines its
detectability under various notions of dependence. The profound impact of
dependence is best illustrated under the CurieWeiss model where we observe the
effect of a "thermodynamic" phase transition. In particular, the critical state
exhibits a subtle "blessing of dependence" phenomenon in that one can detect
much weaker signals at criticality than otherwise. Furthermore, we develop a
testing procedure that is broadly applicable to account for dependence and show
that it is asymptotically minimax optimal under fairly general regularity
conditions.

Colocalization is a powerful tool to study the interactions between
fluorescently labeled molecules in biological fluorescence microscopy. However,
existing techniques for colocalization analysis have not undergone continued
development especially in regards to robust statistical support. In this paper,
we examine two of the most popular quantification techniques for colocalization
and argue that they could be improved upon using ideas from nonparametric
statistics and scan statistics. In particular, we propose a new colocalization
metric that is robust, easily implementable, and optimal in a rigorous
statistical testing framework. Application to several benchmark datasets, as
well as biological examples, further demonstrates the usefulness of the
proposed technique.

The reproducing kernel Hilbert space (RKHS) embedding of distributions offers
a general and flexible framework for testing problems in arbitrary domains and
has attracted considerable amount of attention in recent years. To gain
insights into their operating characteristics, we study here the statistical
performance of such approaches within a minimax framework. Focusing on the case
of goodnessoffit tests, our analyses show that a vanilla version of the
kernelembedding based test could be suboptimal, and suggest a simple remedy by
moderating the embedding. We prove that the moderated approach provides optimal
tests for a wide range of deviations from the null and can also be made
adaptive over a large collection of interpolation spaces. Numerical experiments
are presented to further demonstrate the merits of our approach.

Many statistical procedures, including goodnessoffit tests and methods for
independent component analysis, rely critically on the estimation of the
entropy of a distribution. In this paper, we seek entropy estimators that are
efficient and achieve the local asymptotic minimax lower bound with respect to
squared error loss. To this end, we study weighted averages of the estimators
originally proposed by Kozachenko and Leonenko (1987), based on the $k$nearest
neighbour distances of a sample of $n$ independent and identically distributed
random vectors in $\mathbb{R}^d$. A careful choice of weights enables us to
obtain an efficient estimator in arbitrary dimensions, given sufficient
smoothness, while the original unweighted estimator is typically only efficient
when $d \leq 3$. In addition to the new estimator proposed and theoretical
understanding provided, our results facilitate the construction of
asymptotically valid confidence intervals for the entropy of asymptotically
minimal width.

In this paper we present a general convex optimization approach for solving
highdimensional multiple response tensor regression problems under
lowdimensional structural assumptions. We consider using convex and weakly
decomposable regularizers assuming that the underlying tensor lies in an
unknown lowdimensional subspace. Within our framework, we derive general risk
bounds of the resulting estimate under fairly general dependence structure
among covariates. Our framework leads to upper bounds in terms of two very
simple quantities, the \emph{Gaussian width} of a convex set in tensor space
and the \emph{intrinsic dimension} of the lowdimensional tensor subspace. To
the best of our knowledge, this is the first general framework that applies to
multiple response problems. These general bounds provide useful upper bounds on
rates of convergence for a number of fundamental statistical models of interest
including multiresponse regression, vector autoregressive models, lowrank
tensor models and pairwise interaction models. Moreover, in many of these
settings we prove that the resulting estimates are minimax optimal. We also
provide a numerical study that both validates our theoretical guarantees and
demonstrates the breadth of our framework.

Spatiotemporal gene expression data of the human brain offer insights on the
spa tial and temporal patterns of gene regulation during brain development.
Most existing methods for analyzing these data consider spatial and temporal
profiles separately with the implicit assumption that different brain regions
develop in similar trajectories, and that the spatial patterns of gene
expression remain similar at different time points. Al though these analyses
may help delineate gene regulation either spatially or temporally, they are not
able to characterize heterogeneity in temporal dynamics across different brain
regions, or the evolution of spatial patterns of gene regulation over time. In
this article, we develop a statistical method based on low rank tensor
decomposition to more effectively analyze spatiotemporal gene expression data.
We generalize the clas sical principal component analysis (PCA) which is
applicable only to data matrices, to tensor PCA that can simultaneously capture
spatial and temporal effects. We also propose an efficient algorithm that
combines tensor unfolding and power iteration to estimate the tensor principal
components, and provide guarantees on their statistical performances. Numerical
experiments are presented to further demonstrate the mer its of the proposed
method. An application of our method to a spatiotemporal brain expression data
provides insights on gene regulation patterns in the brain.

In this paper, we investigate the sample size requirement for exact recovery
of a high order tensor of low rank from a subset of its entries. We show that a
gradient descent algorithm with initial value obtained from a spectral method
can, in particular, reconstruct a ${d\times d\times d}$ tensor of multilinear
ranks $(r,r,r)$ with high probability from as few as
$O(r^{7/2}d^{3/2}\log^{7/2}d+r^7d\log^6d)$ entries. In the case when the ranks
$r=O(1)$, our sample size requirement matches those for nuclear norm
minimization (Yuan and Zhang, 2016a), or alternating least squares assuming
orthogonal decomposability (Jain and Oh, 2014). Unlike these earlier
approaches, however, our method is efficient to compute, easy to implement, and
does not impose extra structures on the tensor. Numerical results are presented
to further demonstrate the merits of the proposed approach.

In this paper, we consider the problem of learning highdimensional tensor
regression problems with lowrank structure. One of the core challenges
associated with learning highdimensional models is computation since the
underlying optimization problems are often nonconvex. While convex relaxations
could lead to polynomialtime algorithms they are often slow in practice. On
the other hand, limited theoretical guarantees exist for nonconvex methods. In
this paper we provide a general framework that provides theoretical guarantees
for learning highdimensional tensor regression models under different lowrank
structural assumptions using the projected gradient descent algorithm applied
to a potentially nonconvex constraint set $\Theta$ in terms of its
\emph{localized Gaussian width}. We juxtapose our theoretical results for
nonconvex projected gradient descent algorithms with previous results on
regularized convex approaches. The two main differences between the convex and
nonconvex approach are: (i) from a computational perspective whether the
nonconvex projection operator is computable and whether the projection has
desirable contraction properties and (ii) from a statistical upper bound
perspective, the nonconvex approach has a superior rate for a number of
examples. We provide three concrete examples of lowdimensional structure which
address these issues and explain the pros and cons for the nonconvex and
convex approaches. We supplement our theoretical results with simulations which
show that, under several common settings of generalized low rank tensor
regression, the projected gradient descent approach is superior both in terms
of statistical error and runtime provided the stepsizes of the projected
descent algorithm are suitably chosen.

Motivated by gene set enrichment analysis, we investigate the problem of
combined hypothesis testing on a graph. We introduce a general framework to
effectively use the structural information of the underlying graph when testing
multivariate means. A new testing procedure is proposed within this framework.
We show that the test is optimal in that it can consistently detect departure
from the collective null at a rate that no other test could improve, for almost
all graphs. We also provide general performance bounds for the proposed test
under any specific graph, and illustrate their utility through several common
types of graphs. Numerical experiments are presented to further demonstrate the
merits of our approach.

In this paper, we investigate the sample size requirement for a general class
of nuclear norm minimization methods for higher order tensor completion. We
introduce a class of tensor norms by allowing for different levels of
coherence, which allows us to leverage the incoherence of a tensor. In
particular, we show that a $k$th order tensor of rank $r$ and dimension
$d\times\cdots\times d$ can be recovered perfectly from as few as
$O((r^{(k1)/2}d^{3/2}+r^{k1}d)(\log(d))^2)$ uniformly sampled entries through
an appropriate incoherent nuclear norm minimization. Our results demonstrate
some key differences between completing a matrix and a higher order tensor:
They not only point to potential room for improvement over the usual nuclear
norm minimization but also highlight the importance of explicitly accounting
for incoherence, when dealing with higher order tensors.

Motivated by the problem of colocalization analysis in fluorescence
microscopic imaging, we study in this paper structured detection of correlated
regions between two random processes observed on a common domain. We argue that
although intuitive, direct use of the maximum loglikelihood statistic suffers
from potential bias and substantially reduced power, and introduce a simple
sizebased normalization to overcome this problem. We show that scanning with
the proposed sizecorrected likelihood ratio statistics leads to optimal
correlation detection over a large collection of structured correlation
detection problems.

Quantum state tomography aims to determine the state of a quantum system as
represented by a density matrix. It is a fundamental task in modern scientific
studies involving quantum systems. In this paper, we study estimation of
highdimensional density matrices based on Pauli measurements. In particular,
under appropriate notion of sparsity, we establish the minimax optimal rates of
convergence for estimation of the density matrix under both the spectral and
Frobenius norm losses; and show how these rates can be achieved by a common
thresholding approach. Numerical performance of the proposed estimator is also
investigated.

We establish minimax optimal rates of convergence for estimation in a high
dimensional additive model assuming that it is approximately sparse. Our
results reveal an interesting phase transition behavior universal to this class
of high dimensional problems. In the {\it sparse regime} when the components
are sufficiently smooth or the dimensionality is sufficiently large, the
optimal rates are identical to those for high dimensional linear regression,
and therefore there is no additional cost to entertain a nonparametric model.
Otherwise, in the socalled {\it smooth regime}, the rates coincide with the
optimal rates for estimating a univariate function, and therefore they are
immune to the "curse of dimensionality".

Although recovering an Euclidean distance matrix from noisy observations is a
common problem in practice, how well this could be done remains largely
unknown. To fill in this void, we study a simple distance matrix estimate based
upon the socalled regularized kernel estimate. We show that such an estimate
can be characterized as simply applying a constant amount of shrinkage to all
observed pairwise distances. This fact allows us to establish risk bounds for
the estimate implying that the true distances can be estimated consistently in
an average sense as the number of objects increases. In addition, such a
characterization suggests an efficient algorithm to compute the distance matrix
estimator, as an alternative to the usual second order cone programming known
not to scale well for large problems. Numerical experiments and an application
in visualizing the diversity of Vpu protein sequences from a recent HIV1 study
further demonstrate the practical merits of the proposed method.

Motivated by a range of applications in engineering and genomics, we consider
in this paper detection of very short signal segments in three settings:
signals with known shape, arbitrary signals, and smooth signals. Optimal rates
of detection are established for the three cases and rateoptimal detectors are
constructed. The detectors are easily implementable and are based on scanning
with linear and quadratic statistics. Our analysis reveals both similarities
and differences in the strategy and fundamental difficulty of detection among
these three settings.

Discussion of "A significance test for the lasso" by Richard Lockhart,
Jonathan Taylor, Ryan J. Tibshirani, Robert Tibshirani [arXiv:1301.7161].

Many problems can be formulated as recovering a lowrank tensor. Although an
increasingly common task, tensor recovery remains a challenging problem because
of the delicacy associated with the decomposition of higher order tensors. To
overcome these difficulties, existing approaches often proceed by unfolding
tensors into matrices and then apply techniques for matrix completion. We show
here that such matricization fails to exploit the tensor structure and may lead
to suboptimal procedure. More specifically, we investigate a convex
optimization approach to tensor completion by directly minimizing a tensor
nuclear norm and prove that this leads to an improved sample size requirement.
To establish our results, we develop a series of algebraic and probabilistic
techniques such as characterization of subdifferetial for tensor nuclear norm
and concentration inequalities for tensor martingales, which may be of
independent interests and could be useful in other tensor related problems.

Clustering methods have led to a number of important discoveries in
bioinformatics and beyond. A major challenge in their use is determining which
clusters represent important underlying structure, as opposed to spurious
sampling artifacts. This challenge is especially serious, and very few methods
are available when the data are very high in dimension. Statistical
Significance of Clustering (SigClust) is a recently developed cluster
evaluation tool for high dimensional low sample size data. An important
component of the SigClust approach is the very definition of a single cluster
as a subset of data sampled from a multivariate Gaussian distribution. The
implementation of SigClust requires the estimation of the eigenvalues of the
covariance matrix for the null multivariate Gaussian distribution. We show that
the original eigenvalue estimation can lead to a test that suffers from severe
inflation of typeI error, in the important case where there are huge single
spikes in the eigenvalues. This paper addresses this critical challenge using a
novel likelihood based soft thresholding approach to estimate these eigenvalues
which leads to a much improved SigClust. These major improvements in SigClust
performance are shown by both theoretical work and an extensive simulation
study. Applications to some cancer genomic data further demonstrate the
usefulness of these improvements.

The problem of multiple kernel learning based on penalized empirical risk
minimization is discussed. The complexity penalty is determined jointly by the
empirical $L_2$ norms and the reproducing kernel Hilbert space (RKHS) norms
induced by the kernels with a datadriven choice of regularization parameters.
The main focus is on the case when the total number of kernels is large, but
only a relatively small number of them is needed to represent the target
function, so that the problem is sparse. The goal is to establish oracle
inequalities for the excess risk of the resulting prediction rule showing that
the method is adaptive both to the unknown design distribution and to the
sparsity of the problem.

We study in this paper a smoothness regularization method for functional
linear regression and provide a unified treatment for both the prediction and
estimation problems. By developing a tool on simultaneous diagonalization of
two positive definite kernels, we obtain shaper results on the minimax rates of
convergence and show that smoothness regularized estimators achieve the optimal
rates of convergence for both prediction and estimation under conditions weaker
than those for the functional principal components based methods developed in
the literature. Despite the generality of the method of regularization, we show
that the procedure is easily implementable. Numerical results are obtained to
illustrate the merits of the method and to demonstrate the theoretical
developments.

Discussion of "Latent variable graphical model selection via convex
optimization" by Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky
[arXiv:1008.1290].

Estimation of large covariance matrices has drawn considerable recent
attention, and the theoretical focus so far has mainly been on developing a
minimax theory over a fixed parameter space. In this paper, we consider
adaptive covariance matrix estimation where the goal is to construct a single
procedure which is minimax rate optimal simultaneously over each parameter
space in a large collection. A fully datadriven block thresholding estimator
is proposed. The estimator is constructed by carefully dividing the sample
covariance matrix into blocks and then simultaneously estimating the entries in
a block by thresholding. The estimator is shown to be optimally rate adaptive
over a wide range of bandable covariance matrices. A simulation study is
carried out and shows that the block thresholding estimator performs well
numerically. Some of the technical tools developed in this paper can also be of
independent interest.