
Ensembles of decision trees are a useful tool for obtaining for obtaining
flexible estimates of regression functions. Examples of these methods include
gradient boosted decision trees, random forests, and Bayesian CART. Two
potential shortcomings of tree ensembles are their lack of smoothness and
vulnerability to the curse of dimensionality. We show that these issues can be
overcome by instead considering sparsity inducing soft decision trees in which
the decisions are treated as probabilistic. We implement this in the context of
the Bayesian additive regression trees framework, and illustrate its promising
performance through testing on benchmark datasets. We provide strong
theoretical support for our methodology by showing that the posterior
distribution concentrates at the minimax rate (upto a logarithmic factor) for
sparse functions and functions with additive structures in the highdimensional
regime where the dimensionality of the covariate space is allowed to grow near
exponentially in the sample size. Our method also adapts to the unknown
smoothness and sparsity levels, and can be implemented by making minimal
modifications to existing BART algorithms.

We investigate a wide class of twodimensional hyperbolic systems with
singularities, and prove the almost sure invariance principle (ASIP) for the
random process generated by sequences of dynamically H\"older observables. The
observables could be unbounded, and the process may be nonstationary and need
not have linearly growing variances. Our results apply to Anosov
diffeomorphisms, Sinai dispersing billiards and their perturbations. The random
processes under consideration are related to the fluctuation of Lyapunov
exponents, the shrinking target problem, etc.

For any $C^\infty$, areapreserving Anosov diffeomorphism $f$ of a surface,
we show that a suspension flow over $f$ is $C^\infty$conjugate to a
constanttime suspension flow of a hyperbolic automorphism of the two torus if
and only if the volume measure is the measure with maximal entropy. We also
show that the the metric entropy with respect to the volume measure and the
topological entropy of suspension flow over Anosov diffeomorphisms on torus
achieve all possible values. Our results fit into two programs related to
entropy rigidity and flexibility of Anosov systems.

We provide a topological classification of locally constant functions over
subshifts of finite type via their zerotemperature measures. Our approach is
to analyze the relationship between the distribution of the zerotemperature
measures and the boundary of higher dimensional generalized rotation sets. We
also discuss the regularity of the localized entropy function on the boundary
of the generalized rotation sets.

In this paper, we present a criterion for pitchfork bifurcation of smooth
vector fields based on a topological argument. Our result expands Rajapakse and
Smale's result \cite{RS2} significantly. Based on our criterion, we present a
class of families of nonsymmetric vector fields undergoing a pitchfork
bifurcation.

We propose a family of variational approximations to Bayesian posterior
distributions, called $\alpha$VB, with provable statistical guarantees. The
standard variational approximation is a special case of $\alpha$VB with
$\alpha=1$. When $\alpha \in(0,1]$, a novel class of variational inequalities
are developed for linking the Bayes risk under the variational approximation to
the objective function in the variational optimization problem, implying that
maximizing the evidence lower bound in variational inference has the effect of
minimizing the Bayes risk within the variational density family. Operating in a
frequentist setup, the variational inequalities imply that point estimates
constructed from the $\alpha$VB procedure converge at an optimal rate to the
true parameter in a wide range of problems. We illustrate our general theory
with a number of examples, including the meanfield variational approximation
to (low)highdimensional Bayesian linear regression with spike and slab
priors, mixture of Gaussian models, latent Dirichlet allocation, and (mixture
of) Gaussian variational approximation in regular parametric models.

The article addresses a longstanding open problem on the justification of
using variational Bayes methods for parameter estimation. We provide general
conditions for obtaining optimal risk bounds for point estimates acquired from
meanfield variational Bayesian inference. The conditions pertain to the
existence of certain test functions for the distance metric on the parameter
space and minimal assumptions on the prior. A general recipe for verification
of the conditions is outlined which is broadly applicable to existing Bayesian
models with or without latent variables. As illustrations, specific
applications to Latent Dirichlet Allocation and Gaussian mixture models are
discussed.

Gaussian process (GP) regression is a powerful interpolation technique due to
its flexibility in capturing nonlinearity. In this paper, we provide a general
framework for understanding the frequentist coverage of pointwise and
simultaneous Bayesian credible sets in GP regression. As an intermediate
result, we develop a Bernstein vonMises type result under supremum norm in
random design GP regression. Identifying both the mean and covariance function
of the posterior distribution of the Gaussian process as regularized
$M$estimators, we show that the sampling distribution of the posterior mean
function and the centered posterior distribution can be respectively
approximated by two population level GPs. By developing a comparison inequality
between two GPs, we provide exact characterization of frequentist coverage
probabilities of Bayesian pointwise credible intervals and simultaneous
credible bands of the regression function. Our results show that inference
based on GP regression tends to be conservative; when the prior is
undersmoothed, the resulting credible intervals and bands have minimaxoptimal
sizes, with their frequentist coverage converging to a nondegenerate value
between their nominal level and one. As a byproduct of our theory, we show that
the GP regression also yields minimaxoptimal posterior contraction rate
relative to the supremum norm, which provides a positive evidence to the long
standing problem on optimal supremum norm contraction rate in GP regression.

In this paper, we propose a new method for estimation and constructing
confidence intervals for lowdimensional components in a highdimensional
model. The proposed estimator, called Constrained Lasso (CLasso) estimator, is
obtained by simultaneously solving two estimating equationsone imposing a
zerobias constraint for the lowdimensional parameter and the other forming an
$\ell_1$penalized procedure for the highdimensional nuisance parameter. By
carefully choosing the zerobias constraint, the resulting estimator of the low
dimensional parameter is shown to admit an asymptotically normal limit
attaining the Cram\'{e}rRao lower bound in a semiparametric sense. We propose
a tuningfree iterative algorithm for implementing the CLasso. We show that
when the algorithm is initialized at the Lasso estimator, the desparsified
estimator proposed in van de Geer et al. [\emph{Ann. Statist.} {\bf 42} (2014)
11661202] is asymptotically equivalent to the first iterate of the algorithm.
We analyse the asymptotic properties of the CLasso estimator and show the
globally linear convergence of the algorithm. We also demonstrate encouraging
empirical performance of the CLasso through numerical studies.

In this paper, we propose a model of decentralized energy storages, who serve
as instruments to shift energy supply intertemporally. From storages'
perspective, we investigate their optimal buying or selling decisions under
market uncertainty. The goal of this paper is to understand the economic value
of future market information, as energy storages mitigate market uncertainty by
forwardlooking strategies.
At a system level, we evaluate different information management policies to
coordinate storages' actions and improve their aggregate profitability: (1)
providing a publicly available market forecasting channel; (2) encouraging
decentralized storages to share their private forecasts with each other; (3)
releasing additional market information to a targeted subset of storages
exclusively. We highlight the perils of too much market information provision
and advice on exclusiveness of market forecast channel.

We obtain a nature generalization for an affine Sierpinski carpet and
Sierpinski triangle to $n$dimensional space, by using the generations and
characterizations of affinelyequivalent Sierpinski carpet. Exactly, in this
paper, a Menger sponge and Sierpinski simplex in $4$dimensional space could be
drawn out clearly under an affine transformation. Furthermore, the method could
be used to a much broader class in fractals.

We consider nonparametric testing in a nonasymptotic framework. Our
statistical guarantees are exact in the sense that Type I and II errors are
controlled for any finite sample size. Meanwhile, one proposed test is shown to
achieve minimax optimality in the asymptotic sense. An important consequence of
this nonasymptotic theory is a new and practically useful formula for
selecting the optimal smoothing parameter in nonparametric testing. The leading
example in this paper is smoothing spline models under Gaussian errors. The
results obtained therein can be further generalized to the kernel ridge
regression framework under possibly nonGaussian errors. Simulations
demonstrate that our proposed test improves over the conventional asymptotic
test when sample size is small to moderate.

In this article, we investigate large sample properties of model selection
procedures in a general Bayesian framework when a closed form expression of the
marginal likelihood function is not available or a local asymptotic quadratic
approximation of the loglikelihood function does not exist. Under appropriate
identifiability assumptions on the true model, we provide sufficient conditions
for a Bayesian model selection procedure to be consistent and obey the Occam's
razor phenomenon, i.e., the probability of selecting the "smallest" model that
contains the truth tends to one as the sample size goes to infinity. In order
to show that a Bayesian model selection procedure selects the smallest model
containing the truth, we impose a prior anticoncentration condition, requiring
the prior mass assigned by large models to a neighborhood of the truth to be
sufficiently small. In a more general setting where the strong model
identifiability assumption may not hold, we introduce the notion of local
Bayesian complexity and develop oracle inequalities for Bayesian model
selection procedures. Our Bayesian oracle inequality characterizes a tradeoff
between the approximation error and a Bayesian characterization of the local
complexity of the model, illustrating the adaptive nature of averagingbased
Bayesian procedures towards achieving an optimal rate of posterior convergence.
Specific applications of the model selection theory are discussed in the
context of highdimensional nonparametric regression and density regression
where the regression function or the conditional density is assumed to depend
on a fixed subset of predictors. As a result of independent interest, we
propose a general technique for obtaining upper bounds of certain small ball
probability of stationary Gaussian processes.

Using the moving frame and invariants, any discrete curve in $\R^3$ could be
uniquely identified by its centroaffine curvatures and torsions. In this paper,
depending on the affine curvatures of the fractal curves, such as Koch curve
and Hilbert curve, we can clearly describe their iterative regularities.
Interestingly, by the affine curvatures, the fractal curves can be quantified
and encoded accordingly to a sequence. Hence, it is more convenient for future
reference. Given three starting points, we can directly generate the affine
Koch curve and affine Hilbert curve at the step $n, \forall n\in \mathbb{Z}^+$.
Certainly, if the initial three points are standard, the curve is the
traditional Koch curve or Hilbert curve. By this method, the characteristic of
some fractal curves which look like irregular could be quantified, and the
regularities would become more obvious.

Any two equivalent discrete curves must have the same invariants at the
corresponding points under an affine transformation. In this paper, we
construct the moving frame and invariants for the discrete centroaffine curves,
which could be used to discriminate the same discrete curves from different
graphics, and estimate whether a polygon flow is stable or periodically stable.
In fact, using the similar method as the FrenetSerret frame, a discrete curve
can be uniquely identified by its centroaffine curvatures and torsions. In
1878, Darboux studied the problem of midpoint iteration of polygons[12].
Berlekamp et al studied this problem in detail[2]. Now, through the
centroaffine curvatures and torsions, the iteration process can be clearly
quantified. Exactly, we describe the whole iteration process by using
centroaffine curvatures and torsions, and its periodicity could be directly
exhibited. As an application, we would obtain some stable discrete space curves
with changeless curvatures and torsions after multistep iteration. For the
pentagram map of a polygon, the affinely regular polygons are stable.
Furthermore, we find the convex hexagons with parallel and equilength opposite
sides are periodically stable, and some convex parallel and equilength
opposite sides octagons are also periodically stable. The proofs of these
results are obtained using the structure equations of the discrete cnetroaffine
curves and the integrable conditions of its flows.

We consider the fractional posterior distribution that is obtained by
updating a prior distribution via Bayes theorem with a fractional likelihood
function, a usual likelihood function raised to a fractional power. First, we
analyze the contraction property of the fractional posterior in a general
misspecified framework. Our contraction results only require a prior mass
condition on certain KullbackLeibler (KL) neighborhood of the true parameter
(or the KL divergence minimizer in the misspecified case), and obviate
constructions of test functions and sieves commonly used in the literature for
analyzing the contraction property of a regular posterior. We show through a
counterexample that some condition controlling the complexity of the parameter
space is necessary for the regular posterior to contract, rendering additional
flexibility on the choice of the prior for the fractional posterior. Second, we
derive a novel Bayesian oracle inequality based on a PACBayes inequality in
misspecified models. Our derivation reveals several advantages of averaging
based Bayesian procedures over optimization based frequentist procedures. As an
application of the Bayesian oracle inequality, we derive a sharp oracle
inequality in the convex regression problem under an arbitrary dimension. We
also illustrate the theory in Gaussian process regression and density
estimation problems.

We present a Communicationefficient Surrogate Likelihood (CSL) framework for
solving distributed statistical inference problems. CSL provides a
communicationefficient surrogate to the global likelihood that can be used for
lowdimensional estimation, highdimensional regularized estimation and
Bayesian inference. For lowdimensional estimation, CSL provably improves upon
naive averaging schemes and facilitates the construction of confidence
intervals. For highdimensional regularized estimation, CSL leads to a
minimaxoptimal estimator with controlled communication cost. For Bayesian
inference, CSL can be used to form a communicationefficient quasiposterior
distribution that converges to the true posterior. This quasiposterior
procedure significantly improves the computational efficiency of MCMC
algorithms even in a nondistributed setting. We present both theoretical
analysis and experiments to explore the properties of the CSL approximation.

In this paper we build the structure equations and the integrable systems for
a discrete centroaffine indefinite surface in $\R^3$. At the same time, some
centroaffine invariants are obtained according to the structure equations.
Using these centroaffine invariants, we study the Laplacian operator and the
convexity of a discrete centroaffine indefinite surface. Furthermore, some
interest examples are provided.

Cloud computing provides scientists a platform that can deploy computation
and data intensive applications without infrastructure investment. With
excessive cloud resources and a decision support system, large generated data
sets can be flexibly 1 stored locally in the current cloud, 2 deleted and
regenerated whenever reused or 3 transferred to cheaper cloud service for
storage. However, due to the pay for use model, the total application cost
largely depends on the usage of computation, storage and bandwidth resources,
hence cutting the cost of cloud based data storage becomes a big concern for
deploying scientific applications in the cloud. In this paper, we propose a
novel strategy that can cost effectively store large generated data sets with
multiple cloud service providers. The strategy is based on a novel algorithm
that finds the trade off among computation, storage and bandwidth costs in the
cloud, which are three key factors for the cost of data storage. Both general
(random) simulations conducted with popular cloud service providers pricing
models and three specific case studies on real world scientific applications
show that the proposed storage strategy is highly cost effective and practical
for run time utilization in the cloud.

Let $\Diff^{ r}_m(M)$ be the set of $C^{ r}$ volumepreserving
diffeomorphisms on a compact Riemannian manifold $M$ ($\dim M\geq 2$). In this
paper, we prove that the diffeomorphisms without zero Lyapunov exponents on a
set of positive volume are $C^1$ dense in $\Diff^{ r}_m(M), r\geq 1$. We also
prove a weaker result for symplectic diffeomorphisms
$\mathcal{S}ym^{r}_{\omega}(M), r\geq1 $ saying that the symplectic
diffeomorphisms with nonzero Lyapunov exponents on a set of positive volume
are $C^1$ dense in $\mathcal{S}ym^{r}_{\omega}(M), r\geq1 $.

In spite of the recent surge of interest in quantile regression, joint
estimation of linear quantile planes remains a great challenge in statistics
and econometrics. We propose a novel parametrization that characterizes any
collection of noncrossing quantile planes over arbitrarily shaped convex
predictor domains in any dimension by means of unconstrained scalar, vector and
function valued parameters. Statistical models based on this parametrization
inherit a fast computation of the likelihood function, enabling penalized
likelihood or Bayesian approaches to model fitting. We introduce a complete
Bayesian methodology by using Gaussian process prior distributions on the
function valued parameters and develop a robust and efficient Markov chain
Monte Carlo parameter estimation. The resulting method is shown to offer
posterior consistency under mild tail and regularity conditions. We present
several illustrative examples where the new method is compared against existing
approaches and is found to offer better accuracy, coverage and model fit.

In this article we will discuss a new, mostly theoretical, method for solving
(zerodimensional) polynomial systems, which lies in between Gr\"obner basis
computations and the heuristic first fall degree assumption and is not based on
any heuristic. This method relies on the new concept of last fall degree.
Let $k$ be a finite field of cardinality $q^n$ and let $k'$ be its subfield
of cardinality $q$. Let $\mathcal{F} \subset k[X_0,\ldots,X_{m1}]$ be a finite
subset generating a zerodimensional ideal. We give an upper bound of the last
fall degree of the Weil descent system of $\mathcal{F}$, which depends on $q$,
$m$, the last fall degree of $\mathcal{F}$, the degree of $\mathcal{F}$ and the
number of solutions of $\mathcal{F}$, but not on $n$. This shows that such Weil
descent systems can be solved efficiently if $n$ grows. In particular, we apply
these results for multiHFE and essentially show that multiHFE is insecure.
Finally, we discuss that the degree of regularity (or last fall degree) of
Weil descent systems coming from summation polynomials to solve the elliptic
curve discrete logarithm problem might depend on $n$, since such systems
without field equations are not zerodimensional.

We study the computational complexity of Markov chain Monte Carlo (MCMC)
methods for highdimensional Bayesian linear regression under sparsity
constraints. We first show that a Bayesian approach can achieve
variableselection consistency under relatively mild conditions on the design
matrix. We then demonstrate that the statistical criterion of posterior
concentration need not imply the computational desideratum of rapid mixing of
the MCMC algorithm. By introducing a truncated sparsity prior for variable
selection, we provide a set of conditions that guarantee both
variableselection consistency and rapid mixing of a particular
MetropolisHastings algorithm. The mixing time is linear in the number of
covariates up to a logarithmic factor. Our proof controls the spectral gap of
the Markov chain by constructing a canonical path ensemble that is inspired by
the steps taken by greedy algorithms for variable selection.

In this paper we establish a closing property and a hyperbolic closing
property for thin trapped chain hyperbolic homoclinic classes with one
dimensional center in partial hyperbolicity setting. Taking advantage of theses
properties, we prove that the growth rate of the number of hyperbolic periodic
points is equal to the topological entropy. We also obtain that the hyperbolic
periodic measures are dense in the space of invariant measures.

Minimax $L_2$ risks for highdimensional nonparametric regression are derived
under two sparsity assumptions: (1) the true regression surface is a sparse
function that depends only on $d=O(\log n)$ important predictors among a list
of $p$ predictors, with $\log p=o(n)$; (2) the true regression surface depends
on $O(n)$ predictors but is an additive function where each additive component
is sparse but may contain two or more interacting predictors and may have a
smoothness level different from other components. For either modeling
assumption, a practicable extension of the widely used Bayesian Gaussian
process regression method is shown to adaptively attain the optimal minimax
rate (up to $\log n$ terms) asymptotically as both $n,p\to\infty$ with $\log
p=o(n)$.