
Calibration of hydrological timeseries models is a challenging task since
these models give a wide spectrum of output series and calibration procedures
require significant amount of time. From a statistical standpoint, this model
parameter estimation problem simplifies to finding an inverse solution of a
computer model that generates prespecified timeseries output (i.e., realistic
output series). In this paper, we propose a modified history matching approach
for calibrating the timeseries rainfallrunoff models with respect to the real
data collected from the state of Georgia, USA. We present the methodology and
illustrate the application of the algorithm by carrying a simulation study and
the two case studies. Several goodnessoffit statistics were calculated to
assess the model performance. The results showed that the proposed history
matching algorithm led to a significant improvement, of 30% and 14% (in terms
of root mean squared error) and 26% and 118% (in terms of peak percent
threshold statistics), for the two casestudies with MatlabSimulink and SWAT
models, respectively.

Factorial designs with randomization restrictions are often used in
industrial experiments when a complete randomization of trials is impractical.
In the statistics literature, the analysis, construction and isomorphism of
factorial designs has been extensively investigated. Much of the work has been
on a casebycase basis  addressing completely randomized designs, randomized
block designs, splitplot designs, etc. separately. In this paper we take a
more unified approach, developing theoretical results and an efficient
relabeling strategy to both construct and check the isomorphism of multistage
factorial designs with randomization restrictions. The examples presented in
this paper particularly focus on splitlot designs.

The recent accelerated growth in the computing power has generated
popularization of experimentation with dynamic computer models in various
physical and engineering applications. Despite the extensive statistical
research in computer experiments, most of the focus had been on the theoretical
and algorithmic innovations for the design and analysis of computer models with
scalar responses.
In this paper, we propose a computationally efficient statistical emulator
for a largescale dynamic computer simulator (i.e., simulator which gives time
series outputs). The main idea is to first find a good local neighbourhood for
every input location, and then emulate the simulator output via a singular
value decomposition (SVD) based Gaussian process (GP) model. We develop a new
design criterion for sequentially finding this local neighbourhood set of
training points. Several test functions and a reallife application have been
used to demonstrate the performance of the proposed approach over a naive
method of choosing local neighbourhood set using the Euclidean distance among
design points.

In an efficient stock market, the returns and their timedependent volatility
are often jointly modeled by stochastic volatility models (SVMs). Over the last
few decades several SVMs have been proposed to adequately capture the defining
features of the relationship between the return and its volatility. Among one
of the earliest SVM, Taylor (1982) proposed a hierarchical model, where the
current return is a function of the current latent volatility, which is further
modeled as an autoregressive process. In an attempt to make the SVMs more
appropriate for complex realistic market behavior, a leverage parameter was
introduced in the Taylor SVM, which however led to the violation of the
efficient market hypothesis (EMH, a necessary meanzero condition for the
return distribution that prevents arbitrage possibilities). Subsequently, a
host of alternative SVMs had been developed and are currently in use. In this
paper, we propose meancorrections for several generalizations of Taylor SVM
that capture the complex market behavior as well as satisfy EMH. We also
establish a few theoretical results to characterize the key desirable features
of these models, and present comparison with other popular competitors.
Furthermore, four reallife examples (Oil price, CITI bank stock price,
EuroUSD rate, and S&P 500 index returns) have been used to demonstrate the
performance of this new class of SVMs.

For an expensive to evaluate computer simulator, even the estimate of the
overall surface can be a challenging problem. In this paper, we focus on the
estimation of the inverse solution, i.e., to find the set(s) of input
combinations of the simulator that generates (or gives good approximation of) a
predetermined simulator output. Ranjan et al. (2008) proposed an expected
improvement criterion under a sequential design framework for the inverse
problem with a scalar valued simulator. In this paper, we focus on the inverse
problem for a timeseries valued simulator. We have used a few simulated and
two real examples for performance comparison.

In an efficient stock market, the logreturns and their timedependent
variances are often jointly modelled by stochastic volatility models (SVMs).
Many SVMs assume that errors in logreturn and latent volatility process are
uncorrelated, which is unrealistic. It turns out that if a nonzero correlation
is included in the SVM (e.g., Shephard (2005)), then the expected logreturn at
time t conditional on the past returns is nonzero, which is not a desirable
feature of an efficient stock market. In this paper, we propose a
meancorrection for such an SVM for discretetime returns with nonzero
correlation. We also find closed form analytical expressions for higher moments
of logreturn and its leadlag correlations with the volatility process. We
compare the performance of the proposed and classical SVMs on S&P 500 index
returns obtained from NYSE.

A computer code or simulator is a mathematical representation of a physical
system, for example a set of differential equations. Running the code with
given values of the vector of inputs, x, leads to an output y(x) or several
such outputs. For instance, one application we use for illustration simulates
the average tidal power, y, generated as a function of the turbine location, x
= (x1, x2), in the Bay of Fundy, Nova Scotia, Canada (Ranjan et al. 2011).
Performing scientific or engineering experiments via such a computer code is
often more time and cost effective than running a physical experiment.
Choosing new runs sequentially for optimization, moving y to a target, etc.
has been formalized using the concept of expected improvement (Jones et al.
1998). The next experimental run is made where the expected improvement in the
function of interest is largest. This expectation is with respect to the
predictive distribution of y from a statistical model relating y to x. By
considering a set of possible inputs x for the new run, we can choose that
which gives the largest expectation.

Constrained blackbox optimization is a difficult problem, with most
approaches coming from the mathematical programming literature. The statistical
literature is sparse, especially in addressing problems with nontrivial
constraints. This situation is unfortunate because statistical methods have
many attractive properties: global scope, handling noisy objectives,
sensitivity analysis, and so forth. To narrow that gap, we propose a
combination of response surface modeling, expected improvement, and the
augmented Lagrangian numerical optimization framework. This hybrid approach
allows the statistical model to think globally and the augmented Lagrangian to
act locally. We focus on problems where the constraints are the primary
bottleneck, requiring expensive simulation to evaluate and substantial modeling
effort to map out. In that context, our hybridization presents a simple yet
effective solution that allows existing objectiveoriented statistical
approaches, like those based on Gaussian process surrogates and expected
improvement heuristics, to be applied to the constrained setting with minor
modification. This work is motivated by a challenging, realdata benchmark
problem from hydrology where, even with a simple linear objective function,
learning a nontrivial valid region complicates the search for a global minimum.

Latin hypercube designs (LHDs) with spacefilling properties are widely used
for emulating computer simulators. Over the last three decades, a wide spectrum
of LHDs have been proposed with spacefilling criteria like minimum correlation
among factors, maximin interpoint distance, and orthogonality among the factors
via orthogonal arrays (OAs). Projective geometric structures like spreads,
covers and stars of PG(p1,q) can be used to characterize the randomization
restriction of multistage factorial experiments. These geometric structures can
also be used for constructing OAs and nearly OAs (NOAs). In this paper, we
present a new class of spacefilling LHDs based on NOAs derived from stars of
PG(p1, 2).

Gaussian Process (GP) models are popular statistical surrogates used for
emulating computationally expensive computer simulators. The quality of a GP
model fit can be assessed by a goodness of fit measure based on optimized
likelihood. Finding the global maximum of the likelihood function for a GP
model is typically very challenging as the likelihood surface often has
multiple local optima, and an explicit expression for the gradient of the
likelihood function is typically unavailable. Previous methods for optimizing
the likelihood function (e.g. MacDonald et al. (2013)) have proven to be robust
and accurate, though relatively inefficient. We propose several likelihood
optimization techniques, including two modified multistart local search
techniques, based on the method implemented by MacDonald et al. (2013), that
are equally as reliable, and significantly more efficient. A hybridization of
the global search algorithm Dividing Rectangles (DIRECT) with the local
optimization algorithm BFGS provides a comparable GP model quality for a
fraction of the computational cost, and is the preferred optimization technique
when computational resources are limited. We use several test functions and a
real application from an oil reservoir simulation to test and compare the
performance of the proposed methods with the one implemented by MacDonald et
al. (2013) in the R library GPfit. The proposed method is implemented in a
Matlab package, GPMfit.

Classification of satellite images is a key component of many remote sensing
applications. One of the most important products of a raw satellite image is
the classified map which labels the image pixels into meaningful classes.
Though several parametric and nonparametric classifiers have been developed
thus far, accurate labeling of the pixels still remains a challenge. In this
paper, we propose a new reliable multiclassclassifier for identifying class
labels of a satellite image in remote sensing applications. The proposed
multiclassclassifier is a generalization of a binary classifier based on the
flexible ensemble of regression trees model called Bayesian Additive Regression
Trees (BART). We used three small areas from the LANDSAT 5 TM image, acquired
on August 15, 2009 (path/row: 08/29, L1T product, UTM map projection) over
Kings County, Nova Scotia, Canada to classify the landuse. Several prediction
accuracy and uncertainty measures have been used to compare the reliability of
the proposed classifier with the stateoftheart classifiers in remote
sensing.

Gaussian process (GP) models are commonly used statistical metamodels for
emulating expensive computer simulators. Fitting a GP model can be numerically
unstable if any pair of design points in the input space are close together.
Ranjan, Haynes, and Karsten (2011) proposed a computationally stable approach
for fitting GP models to deterministic computer simulators. They used a genetic
algorithm based approach that is robust but computationally intensive for
maximizing the likelihood. This paper implements a slightly modified version of
the model proposed by Ranjan et al. (2011), as the new R package GPfit. A novel
parameterization of the spatial correlation function and a new multistart
gradient based optimization algorithm yield optimization that is robust and
typically faster than the genetic algorithm based approach. We present two
examples with R codes to illustrate the usage of the main functions in GPfit.
Several test functions are used for performance comparison with a popular R
package mlegp. GPfit is a free software and distributed under the general
public license, as part of the R software project (R Development Core Team
2012).

The graphics processing unit (GPU) has emerged as a powerful and cost
effective processor for general performance computing. GPUs are capable of an
order of magnitude more floatingpoint operations per second as compared to
modern central processing units (CPUs), and thus provide a great deal of
promise for computationally intensive statistical applications. Fitting complex
statistical models with a large number of parameters and/or for large datasets
is often very computationally expensive. In this study, we focus on Gaussian
process (GP) models  statistical models commonly used for emulating expensive
computer simulators. We demonstrate that the computational cost of implementing
GP models can be significantly reduced by using a CPU+GPU heterogeneous
computing system over an analogous implementation on a traditional computing
system with no GPU acceleration. Our small study suggests that GP models are
fertile ground for further implementation on CPU+GPU systems.

In computer experiments, a mathematical model implemented on a computer is
used to represent complex physical phenomena. These models, known as computer
simulators, enable experimental study of a virtual representation of the
complex phenomena. Simulators can be thought of as complex functions that take
many inputs and provide an output. Often these simulators are themselves
expensive to compute, and may be approximated by "surrogate models" such as
statistical regression models. In this paper we consider a new kind of
surrogate model, a Bayesian ensemble of trees (Chipman et al. 2010), with the
specific goal of learning enough about the simulator that a particular feature
of the simulator can be estimated. We focus on identifying the simulator's
global minimum. Utilizing the Bayesian version of the Expected Improvement
criterion (Jones et al. 1998), we show that this ensemble is particularly
effective when the simulator is illbehaved, exhibiting nonstationarity or
abrupt changes in the response. A number of illustrations of the approach are
given, including a tidal power application.

For many expensive deterministic computer simulators, the outputs do not have
replication error and the desired metamodel (or statistical emulator) is an
interpolator of the observed data. Realizations of Gaussian spatial processes
(GP) are commonly used to model such simulator outputs. Fitting a GP model to
$n$ data points requires the computation of the inverse and determinant of $n
\times n$ correlation matrices, $R$, that are sometimes computationally
unstable due to nearsingularity of $R$. This happens if any pair of design
points are very close together in the input space. The popular approach to
overcome nearsingularity is to introduce a small nugget (or jitter) parameter
in the model that is estimated along with other model parameters. The inclusion
of a nugget in the model often causes unnecessary oversmoothing of the data.
In this paper, we propose a lower bound on the nugget that minimizes the
oversmoothing and an iterative regularization approach to construct a
predictor that further improves the interpolation accuracy. We also show that
the proposed predictor converges to the GP interpolator.

Deterministic computer simulations are often used as a replacement for
complex physical experiments. Although less expensive than physical
experimentation, computer codes can still be timeconsuming to run. An
effective strategy for exploring the response surface of the deterministic
simulator is the use of an approximation to the computer code, such as a
Gaussian process (GP) model, coupled with a sequential sampling strategy for
choosing design points that can be used to build the GP model. The ultimate
goal of such studies is often the estimation of specific features of interest
of the simulator output, such as the maximum, minimum, or a level set
(contour). Before approximating such features with the GP model, sufficient
runs of the computer simulator must be completed.
Sequential designs with an expected improvement (EI) function can yield good
estimates of the features with a minimal number of runs. The challenge is that
the expected improvement function itself is often multimodal and difficult to
maximize. We develop branch and bound algorithms for efficiently maximizing the
EI function in specific problems, including the simultaneous estimation of a
minimum and a maximum, and in the estimation of a contour. These branch and
bound algorithms outperform other optimization strategies such as genetic
algorithms, and over a number of sequential design steps can lead to
dramatically superior accuracy in estimation of features of interest.

Regular factorial designs with randomization restrictions are widely used in
practice. This paper provides a unified approach to the construction of such
designs using randomization defining contrast subspaces for the representation
of randomization restrictions. We use finite projective geometry to determine
the existence of designs with the required structure and develop a systematic
approach for their construction. An attractive feature is that commonly used
factorial designs with randomization restrictions are special cases of this
general representation. Issues related to the use of these designs for
particular factorial experiments are also addressed.

Identifying promising compounds from a vast collection of feasible compounds
is an important and yet challenging problem in the pharmaceutical industry. An
efficient solution to this problem will help reduce the expenditure at the
early stages of drug discovery. In an attempt to solve this problem, Mandal, Wu
and Johnson [Technometrics 48 (2006) 273283] proposed the SELC algorithm.
Although powerful, it fails to extract substantial information from the data to
guide the search efficiently, as this methodology is not based on any
statistical modeling. The proposed approach uses Gaussian Process (GP) modeling
to improve upon SELC, and hence named $\mathcal{G}$SELC. The performance of
the proposed methodology is illustrated using four and five dimensional test
functions. Finally, we implement the new algorithm on a real pharmaceutical
data set for finding a group of chemical compounds with optimal properties.