
Facing increasing domestic energy consumption from population growth and
industrialization, Saudi Arabia is aiming to reduce its reliance on fossil
fuels and to broaden its energy mix by expanding investment in renewable energy
sources, including wind energy. A preliminary task in the development of wind
energy infrastructure is the assessment of wind energy potential, a key aspect
of which is the characterization of its spatiotemporal behavior. In this study
we examine the impact of internal climate variability on seasonal wind power
density fluctuations over Saudi Arabia using 30 simulations from the Large
Ensemble Project (LENS) developed at the National Center for Atmospheric
Research. Furthermore, a spatiotemporal model for daily wind speed is proposed
with neighborbased crosstemporal dependence, and a multivariate skewt
distribution to capture the spatial patterns of higher order moments. The model
can be used to generate synthetic time series over the entire spatial domain
that adequately reproduce the internal variability of the LENS dataset.

We propose a new copula model for replicated multivariate spatial data.
Unlike classical models that assume multivariate normality of the data, the
proposed copula is based on the assumption that some factors exist that affect
the joint spatial dependence of all measurements of each variable as well as
the joint dependence among these variables. The model is parameterized in terms
of a crosscovariance function that may be chosen from the many models proposed
in the literature. In addition, there are additive factors in the model that
allow tail dependence and reflection asymmetry of each variable measured at
different locations and of different variables to be modeled. The proposed
approach can therefore be seen as an extension of the linear model of
coregionalization widely used for modeling multivariate spatial data. The
likelihood of the model can be obtained in a simple form and therefore the
likelihood estimation is quite fast. The model is not restricted to the set of
data locations, and using the estimated copula, spatial data can be
interpolated at locations where values of variables are unknown. We apply the
proposed model to temperature and pressure data and compare its performance
with the performance of a popular model from multivariate geostatistics.

We use available measurements to estimate the unknown parameters (variance,
smoothness parameter, and covariance length) of a covariance function by
maximizing the joint Gaussian loglikelihood function. To overcome cubic
complexity in the linear algebra, we approximate the discretized covariance
function in the hierarchical (H) matrix format. The Hmatrix format has a
loglinear computational cost and storage $O(kn \log n)$, where the rank $k$ is
a small integer and $n$ is the number of locations. The Hmatrix technique
allows us to work with general covariance matrices in an efficient way, since
Hmatrices can approximate inhomogeneous covariance functions, with a fairly
general mesh that is not necessarily axesparallel, and neither the covariance
matrix itself nor its inverse have to be sparse. We demonstrate our method with
Monte Carlo simulations and an application to soil moisture data. The C, C++
codes and data are freely available.

Describing the complex dependence structure of extreme phenomena is
particularly challenging. To tackle this issue we develop a novel statistical
algorithm that describes extremal dependence taking advantage of the inherent
hierarchical dependence structure of the maxstable nested logistic
distribution and that identifies possible clusters of extreme variables using
reversible jump Markov chain Monte Carlo techniques. Parsimonious
representations are achieved when clusters of extreme variables are found to be
completely independent. Moreover, we significantly decrease the computational
complexity of full likelihood inference by deriving a recursive formula for the
nested logistic model likelihood. The algorithm performance is verified through
extensive simulation experiments which also compare different likelihood
procedures. The new methodology is used to investigate the dependence
relationships between extreme concentration of multiple pollutants in
California and how these pollutants are related to extreme weather conditions.
Overall, we show that our approach allows for the representation of complex
extremal dependence structures and has valid applications in multivariate data
analysis, such as air pollution monitoring, where it can guide policymaking.

We show how to perform full likelihood inference for maxstable multivariate
distributions or processes based on a stochastic ExpectationMaximisation
algorithm, which combines statistical and computational efficiency in
highdimensions. The good performance of this methodology is demonstrated by
simulation based on the popular logistic and BrownResnick models, and it is
shown to provide dramatic computational time improvements with respect to a
direct computation of the likelihood. Strategies to further reduce the
computational burden are also discussed.

We present ExaGeoStat, a high performance framework for geospatial statistics
in climate and environment modeling. In contrast to simulation based on partial
differential equations derived from firstprinciples modeling, ExaGeoStat
employs a statistical model based on the evaluation of the Gaussian
loglikelihood function, which operates on a large dense covariance matrix.
Generated by the parametrizable Matern covariance function, the resulting
matrix is symmetric and positive definite. The computational tasks involved
during the evaluation of the Gaussian loglikelihood function become daunting
as the number n of geographical locations grows, as O(n2) storage and O(n3)
operations are required. While many approximation methods have been devised
from the side of statistical modeling to ameliorate these polynomial
complexities, we are interested here in the complementary approach of
evaluating the exact algebraic result by exploiting advances in solution
algorithms and manycore computer architectures. Using stateoftheart high
performance dense linear algebra libraries associated with various leading edge
parallel architectures (Intel KNLs, NVIDIA GPUs, and distributedmemory
systems), ExaGeoStat raises the game for statistical applications from climate
and environmental science. ExaGeoStat provides a reference evaluation of
statistical parameters, with which to assess the validity of the various
approaches based on approximation. The framework takes a first step in the
merger of largescale data analytics and extreme computing for geospatial
statistical applications, to be followed by additional complexity reducing
improvements from the solver side that can be implemented under the same
interface. Thus, a single uncompromised statistical model can ultimately be
executed in a wide variety of emerging exascale environments.

Maximum likelihood estimation is an important statistical technique for
estimating missing data, for example in climate and environmental applications,
which are usually large and feature data points that are irregularly spaced. In
particular, the Gaussian loglikelihood function is the \emph{de facto} model,
which operates on the resulting sizable dense covariance matrix. The advent of
high performance systems with advanced computing power and memory capacity have
enabled full simulations only for rather small dimensional climate problems,
solved at the machine precision accuracy. The challenge for high dimensional
problems lies in the computation requirements of the loglikelihood function,
which necessitates ${\mathcal O}(n^2)$ storage and ${\mathcal O}(n^3)$
operations, where $n$ represents the number of given spatial locations. This
prohibitive computational cost may be reduced by using approximation techniques
that not only enable largescale simulations otherwise intractable but also
maintain the accuracy and the fidelity of the spatial statistics model. In this
paper, we extend the Exascale GeoStatistics software framework (i.e.,
ExaGeoStat) to support the Tile LowRank (TLR) approximation technique, which
exploits the data sparsity of the dense covariance matrix by compressing the
offdiagonal tiles up to a userdefined accuracy threshold. The underlying
linear algebra operations may then be carried out on this data compression
format, which may ultimately reduce the arithmetic complexity of the maximum
likelihood estimation and the corresponding memory footprint. Performance
results of TLRbased computations on shared and distributedmemory systems
attain up to 13X and 5X speedups, respectively, compared to full accuracy
simulations using synthetic and real datasets (up to 2M), while ensuring
adequate prediction accuracy.

The classification of multivariate functional data is an important task in
scientific research. Unlike pointwise data, functional data are usually
classified by their shapes rather than by their scales. We define an
outlyingness matrix by extending directional outlyingness, an effective measure
of the shape variation of curves that combines the direction of outlyingness
with conventional depth. We propose two classifiers based on directional
outlyingness and the outlyingness matrix, respectively. Our classifiers provide
better performance compared with existing depthbased classifiers when applied
on both univariate and multivariate functional data from simulation studies. We
also test our methods on two data problems: speech recognition and gesture
classification, and obtain results that are consistent with the findings from
the simulated data.

This article proposes a new graphical tool, the magnitudeshape (MS) plot,
for visualizing both the magnitude and shape outlyingness of multivariate
functional data. The proposed tool builds on the recent notion of functional
directional outlyingness, which measures the centrality of functional data by
simultaneously considering the level and the direction of their deviation from
the central region. The MSplot intuitively presents not only levels but also
directions of magnitude outlyingness on the horizontal axis or plane, and
demonstrates shape outlyingness on the vertical axis. A dividing curve or
surface is provided to separate nonoutlying data from the outliers. Both the
simulated data and the practical examples confirm that the MSplot is superior
to existing tools for visualizing centrality and detecting outliers for
functional data.

The direction of outlyingness is crucial to describing the centrality of
multivariate functional data. Motivated by this idea, we generalize classical
depth to directional outlyingness for functional data. We investigate
theoretical properties of functional directional outlyingness and find that the
total outlyingness can be naturally decomposed into two parts: magnitude
outlyingness and shape outlyingness which represent the centrality of a curve
for magnitude and shape, respectively. Using this decomposition, we provide a
visualization tool for the centrality of curves. Furthermore, we design an
outlier detection procedure based on functional directional outlyingness. This
criterion applies to both univariate and multivariate curves and simulation
studies show that it outperforms competing methods. Weather and
electrocardiogram data demonstrate the practical application of our proposed
framework.

Capturing the potentially strong dependence among the peak concentrations of
multiple air pollutants across a spatial region is crucial for assessing the
related public health risks. In order to investigate the multivariate spatial
dependence properties of air pollution extremes, we introduce a new class of
multivariate maxstable processes. Our proposed model admits a hierarchical
treebased formulation, in which the data are conditionally independent given
some latent nested $\alpha$stable random factors. The hierarchical structure
facilitates Bayesian inference and offers a convenient and interpretable
characterization. We fit this nested multivariate maxstable model to the
maxima of air pollution concentrations and temperatures recorded at a number of
sites in the Los Angeles area, showing that the proposed model succeeds in
capturing their complex tail dependence structure.

Wind has the potential to make a significant contribution to future energy
resources. Locating the sources of this renewable energy on a global scale is
however extremely challenging, given the difficulty to store very large data
sets generated by modern computer models. We propose a statistical model that
aims at reproducing the datagenerating mechanism of an ensemble of runs via a
Stochastic Generator (SG) of global annual wind data. We introduce an
evolutionary spectrum approach with spatially varying parameters based on
largescale geographical descriptors such as altitude to better account for
different regimes across the Earth's orography. We consider a multistep
conditional likelihood approach to estimate the parameters that explicitly
accounts for nonstationary features while also balancing memory storage and
distributed computation. We apply the proposed model to more than 18 million
points of yearly global wind speed. The proposed SG requires orders of
magnitude less storage for generating surrogate ensemble members from wind than
does creating additional wind fields from the climate model, even if an
effective lossy data compression algorithm is applied to the simulation output.

We propose a new copula model that can be used with replicated spatial data.
Unlike the multivariate normal copula, the proposed copula is based on the
assumption that a common factor exists and affects the joint dependence of all
measurements of the process. Moreover, the proposed copula can model tail
dependence and tail asymmetry. The model is parameterized in terms of a
covariance function that may be chosen from the many models proposed in the
literature, such as the Matern model. For some choice of common factors, the
joint copula density is given in closed form and therefore likelihood
estimation is very fast. In the general case, onedimensional numerical
integration is needed to calculate the likelihood, but estimation is still
reasonably fast even with large data sets. We use simulation studies to show
the wide range of dependence structures that can be generated by the proposed
model with different choices of common factors. We apply the proposed model to
spatial temperature data and compare its performance with some popular
geostatistics models.

Functional Magnetic Resonance Imaging (fMRI) is a primary modality for
studying brain activity. Modeling spatial dependence of imaging data at
different scales is one of the main challenges of contemporary neuroimaging,
and it could allow for accurate testing for significance in neural activity.
The high dimensionality of this type of data (on the order of hundreds of
thousands of voxels) poses serious modeling challenges and considerable
computational constraints. For the sake of feasibility, standard models
typically reduce dimensionality by modeling covariance among regions of
interest (ROIs)  coarser or larger spatial units  rather than among voxels.
However, ignoring spatial dependence at different scales could drastically
reduce our ability to detect activation patterns in the brain and hence produce
misleading results. To overcome these problems, we introduce a multiresolution
spatiotemporal model and a computationally efficient methodology to estimate
cognitive control related activation and wholebrain connectivity. The proposed
model allows for testing voxelspecific activation while accounting for
nonstationary local spatial dependence within anatomically defined ROIs, as
well as regional dependence (betweenROIs). Furthermore, the model allows for
detection of interpretable connectivity patterns among ROIs using the graphical
Least Absolute Shrinkage Selection Operator (LASSO). The model is used in a
motortask fMRI study to investigate brain activation and connectivity patterns
aimed at identifying associations between these patterns and regaining motor
functionality following a stroke.

We develop a multilevel restricted Gaussian maximum likelihood method for
estimating the covariance function parameters and computing the best unbiased
predictor. Our approach produces a new set of multilevel contrasts where the
deterministic parameters of the model are filtered out thus enabling the
estimation of the covariance parameters to be decoupled from the deterministic
component. Moreover, the multilevel covariance matrix of the contrasts exhibit
fast decay that is dependent on the smoothness of the covariance function. Due
to the fast decay of the multilevel covariance matrix coefficients only a
small set is computed with a level dependent criterion. We demonstrate our
approach on problems of up to 512,000 observations with a Matern covariance
function and highly irregular placements of the observations. In addition,
these problems are numerically unstable and hard to solve with traditional
methods.

Maxstable processes are natural models for spatial extremes because they
provide suitable asymptotic approximations to the distribution of maxima of
random fields. In the recent past, several parametric families of stationary
maxstable models have been developed, and fitted to various types of data.
However, a recurrent problem is the modeling of nonstationarity. In this
paper, we develop nonstationary maxstable dependence structures in which
covariates can be easily incorporated. Inference is performed using pairwise
likelihoods, and its performance is assessed by an extensive simulation study
based on a nonstationary locally isotropic extremal $t$ model. Evidence that
unknown parameters are well estimated is provided, and estimation of spatial
return level curves is discussed. The methodology is demonstrated with
temperature maxima recorded over a complex topography. Models are shown to
satisfactorily capture extremal dependence.

We study Bayesian linear regression models with skewsymmetric scale mixtures
of normal error distributions. These kinds of models can be used to capture
departures from the usual assumption of normality of the errors in terms of
heavy tails and asymmetry. We propose a general noninformative prior structure
for these regression models and show that the corresponding posterior
distribution is proper under mild conditions. We extend these propriety results
to cases where the response variables are censored. The latter scenario is of
interest in the context of accelerated failure time models, which are relevant
in survival analysis. We present a simulation study that demonstrates good
frequentist properties of the posterior credible intervals associated to the
proposed priors. This study also sheds some light on the tradeoff between
increased model flexibility and the risk of overfitting. We illustrate the
performance of the proposed models with real data. Although we focus on models
with univariate response variables, we also present some extensions to the
multivariate case in the Supporting Web Material.

We examine some distributions used extensively within the modelbased
clustering literature in recent years, paying special attention to} claims that
have been made about their relative efficacy. Theoretical arguments are
provided as well as real data examples.

Rejoinder of ``CrossCovariance Functions for Multivariate Geostatistics'' by
Genton and Kleiber [arXiv:1507.08017].

Continuously indexed datasets with multiple variables have become ubiquitous
in the geophysical, ecological, environmental and climate sciences, and pose
substantial analysis challenges to scientists and statisticians. For many
years, scientists developed models that aimed at capturing the spatial behavior
for an individual process; only within the last few decades has it become
commonplace to model multiple processes jointly. The key difficulty is in
specifying the crosscovariance function, that is, the function responsible for
the relationship between distinct variables. Indeed, these crosscovariance
functions must be chosen to be consistent with marginal covariance functions in
such a way that the secondorder structure always yields a nonnegative definite
covariance matrix. We review the main approaches to building crosscovariance
models, including the linear model of coregionalization, convolution methods,
the multivariate Mat\'{e}rn and nonstationary and spacetime extensions of
these among others. We additionally cover specialized constructions, including
those designed for asymmetry, compact support and spherical domains, with a
review of physicsconstrained models. We illustrate select models on a
bivariate regional climate model output example for temperature and pressure,
along with a bivariate minimum and maximum temperature observational dataset;
we compare models by likelihood value as well as via crossvalidation
cokriging studies. The article closes with a discussion of unsolved problems.

The main approach to inference for multivariate extremes consists in
approximating the joint upper tail of the observations by a parametric family
arising in the limit for extreme events. The latter may be expressed in terms
of componentwise maxima, high threshold exceedances or point processes,
yielding different but related asymptotic characterizations and estimators. The
present paper clarifies the connections between the main likelihood estimators,
and assesses their practical performance. We investigate their ability to
estimate the extremal dependence structure and to predict future extremes,
using exact calculations and simulation, in the case of the logistic model.

Tukey's $g$and$h$ distribution has been a powerful tool for data
exploration and modeling since its introduction. However, two long standing
challenges associated with this distribution family have remained unsolved
until this day: how to find an optimal estimation procedure and how to make
valid statistical inference on unknown parameters. To overcome these two
challenges, a computationally efficient estimation procedure based on
maximizing an approximated likelihood function of the Tukey's $g$and$h$
distribution is proposed and is shown to have the same estimation efficiency as
the maximum likelihood estimator under mild conditions. The asymptotic
distribution of the proposed estimator is derived and a series of approximated
likelihood ratio test statistics are developed to conduct hypothesis tests
involving two shape parameters of Tukey's $g$and$h$ distribution. Simulation
examples and an analysis of air pollution data are used to demonstrate the
effectiveness of the proposed estimation and testing procedures.

Accurate shortterm wind speed forecasting is needed for the rapid
development and efficient operation of wind energy resources. This is, however,
a very challenging problem. Although on the large scale, the wind speed is
related to atmospheric pressure, temperature, and other meteorological
variables, no improvement in forecasting accuracy was found by incorporating
air pressure and temperature directly into an advanced spacetime statistical
forecasting model, the trigonometric direction diurnal (TDD) model. This paper
proposes to incorporate the geostrophic wind as a new predictor in the TDD
model. The geostrophic wind captures the physical relationship between wind and
pressure through the observed approximate balance between the pressure gradient
force and the Coriolis acceleration due to the Earth's rotation. Based on our
numerical experiments with data from West Texas, our new method produces more
accurate forecasts than does the TDD model using air pressure and temperature
for 1 to 6hourahead forecasts based on three different evaluation criteria.
Furthermore, forecasting errors can be further reduced by using moving average
hourly wind speeds to fit the diurnal pattern. For example, our new method
obtains between 13.9% and 22.4% overall mean absolute error reduction relative
to persistence in 2hourahead forecasts, and between 5.3% and 8.2% reduction
relative to the best previous spacetime methods in this setting.

We study the asymptotic joint distribution of sample spacetime covariance
estimators of strictly stationary random fields. We do this without any
marginal or joint distributional assumptions other than mild moment and mixing
conditions. We consider several situations depending on whether the
observations are regularly or irregularly spaced and whether one part or the
whole domain of interest is fixed or increasing. A simulation experiment
illustrates the theoretical results.

Discussion of ``Breakdown and groups'' by P. L. Davies and U. Gather
[math.ST/0508497]