• Constructing Likelihood Functions for Interval-valued Random Variables(1608.00107)

March 7, 2019 math.ST, stat.TH, stat.ME
There is a growing need for the ability to analyse interval-valued data. However, existing descriptive frameworks to achieve this ignore the process by which interval-valued data are typically constructed; namely by the aggregation of real-valued data generated from some underlying process. In this article we develop the foundations of likelihood based statistical inference for random intervals that directly incorporates the underlying generative procedure into the analysis. That is, it permits the direct fitting of models for the underlying real-valued data given only the random interval-valued summaries. This generative approach overcomes several problems associated with existing methods, including the rarely satisfied assumption of within-interval uniformity. The new methods are illustrated by simulated and real data analyses.
• Extremal properties of the extended skew-normal distribution(1805.03316)

May 8, 2018 stat.ME
The skew-normal and related families are flexible and asymmetric parametric models suitable for modelling a diverse range of systems. We focus on the highly flexible extended skew-normal distribution, and consider when interest is in the extreme values that it can produce. We derive the well-known Mills' inequalities and ratio for the univariate extended skew-normal distribution and establish the asymptotic extreme value distribution for the maxima of samples drawn from this distribution. We show that the multivariate maximum of a high-dimensional extended skew-normal random sample has asymptotically independent components and derive the speed of convergence of the joint tail. To describe the possible dependence among the components of the multivariate maximum, we show that under appropriate conditions an approximate multivariate extreme-value distribution that leads to a rich dependence structure can be derived.
• Dynamic Quantile Function Models(1707.02587)

Sept. 5, 2017 q-fin.RM, stat.ME, stat.AP
We offer a novel way of thinking about the modelling of the time-varying distributions of financial asset returns. Borrowing ideas from symbolic data analysis, we consider data representations beyond scalars and vectors. Specifically, we consider a quantile function as an observation, and develop a new class of dynamic models for quantile-function-valued (QF-valued) time series. In order to make statistical inferences and account for parameter uncertainty, we propose a method whereby a likelihood function can be constructed for QF-valued data, and develop an adaptive MCMC sampling algorithm for simulating from the posterior distribution. Compared to modelling realised measures, modelling the entire quantile functions of intra-daily returns allows one to gain more insight into the dynamic structure of price movements. Via simulations, we show that the proposed MCMC algorithm is effective in recovering the posterior distribution, and that the posterior means are reasonable point estimates of the model parameters. For empirical studies, the new model is applied to analysing one-minute returns of major international stock indices. Through quantile scaling, we further demonstrate the usefulness of our method by forecasting one-step-ahead the Value-at-Risk of daily returns.
• Inferences on the acquisition of multidrug resistance in \emph{Mycobacterium tuberculosis} using molecular epidemiological data(1704.04355)

April 14, 2017 q-bio.QM, q-bio.PE, stat.AP
We investigate the rates of drug resistance acquisition in a natural population using molecular epidemiological data from Bolivia. First, we study the rate of direct acquisition of double resistance from the double sensitive state within patients and compare it to the rates of evolution to single resistance. In particular, we address whether or not double resistance can evolve directly from a double sensitive state within a given host. Second, we aim to understand whether the differences in mutation rates to rifampicin and isoniazid resistance translate to the epidemiological scale. Third, we estimate the proportion of MDR TB cases that are due to the transmission of MDR strains compared to acquisition of resistance through evolution. To address these problems we develop a model of TB transmission in which we track the evolution of resistance to two drugs and the evolution of VNTR loci. However, the available data is incomplete, in that it is recorded only {for a fraction of the population and} at a single point in time. The likelihood function induced by the proposed model is computationally prohibitive to evaluate and accordingly impractical to work with directly. We therefore approach statistical inference using approximate Bayesian computation techniques.
• Exploratory data analysis for extreme values using non-parametric kernel methods(1602.08807)

April 4, 2017 stat.ME
In many settings it is critical to accurately model the extreme tail behaviour of a random process. Non-parametric density estimation methods are commonly implemented as exploratory data analysis techniques for this purpose as they possess excellent visualisation properties, and can naturally avoid the model specification biases implied by using parametric estimators. In particular, kernel-based estimators place minimal assumptions on the data, and provide improved visualisation over scatterplots and histograms. However kernel density estimators are known to perform poorly when estimating extreme tail behaviour, which is important when interest is in process behaviour above some large threshold, and they can over-emphasise bumps in the density for heavy tailed data. In this article we develop a transformation kernel density estimator, and demonstrate that its mean integrated squared error (MISE) efficiency is equivalent to that of standard, non-tail focused kernel density estimators. Estimator performance is illustrated in numerical studies, and in an expanded analysis of the ability of well known global climate models to reproduce observed temperature extremes in Sydney, Australia.
• Variational Bayes with Synthetic Likelihood(1608.03069)

Aug. 10, 2016 stat.ME
Synthetic likelihood is an attractive approach to likelihood-free inference when an approximately Gaussian summary statistic for the data, informative for inference about the parameters, is available. The synthetic likelihood method derives an approximate likelihood function from a plug-in normal density estimate for the summary statistic, with plug-in mean and covariance matrix obtained by Monte Carlo simulation from the model. In this article, we develop alternatives to Markov chain Monte Carlo implementations of Bayesian synthetic likelihoods with reduced computational overheads. Our approach uses stochastic gradient variational inference methods for posterior approximation in the synthetic likelihood context, employing unbiased estimates of the log likelihood. We compare the new method with a related likelihood free variational inference technique in the literature, while at the same time improving the implementation of that approach in a number of ways. These new algorithms are feasible to implement in situations which are challenging for conventional approximate Bayesian computation (ABC) methods, in terms of the dimensionality of the parameter and summary statistic.
• Blocking Collapsed Gibbs Sampler for Latent Dirichlet Allocation Models(1608.00945)

Aug. 2, 2016 stat.CO, stat.ML
The latent Dirichlet allocation (LDA) model is a widely-used latent variable model in machine learning for text analysis. Inference for this model typically involves a single-site collapsed Gibbs sampling step for latent variables associated with observations. The efficiency of the sampling is critical to the success of the model in practical large scale applications. In this article, we introduce a blocking scheme to the collapsed Gibbs sampler for the LDA model which can, with a theoretical guarantee, improve chain mixing efficiency. We develop two procedures, an O(K)-step backward simulation and an O(log K)-step nested simulation, to directly sample the latent variables within each block. We demonstrate that the blocking scheme achieves substantial improvements in chain mixing compared to the state of the art single-site collapsed Gibbs sampler. We also show that when the number of topics is over hundreds, the nested-simulation blocking scheme can achieve a significant reduction in computation time compared to the single-site sampler.
• Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model(1504.04093)

July 7, 2016 stat.CO
Approximate Bayesian computation (ABC) refers to a family of inference methods used in the Bayesian analysis of complex models where evaluation of the likelihood is difficult. Conventional ABC methods often suffer from the curse of dimensionality, and a marginal adjustment strategy was recently introduced in the literature to improve the performance of ABC algorithms in high-dimensional problems. The marginal adjustment approach is extended using a Gaussian copula approximation. The method first estimates the bivariate posterior for each pair of parameters separately using a 2-dimensional Gaussian copula, and then combines these estimates together to estimate the joint posterior. The approximation works well in large sample settings when the posterior is approximately normal, but also works well in many cases which are far from that situation due to the nonparametric estimation of the marginal posterior distributions. If each bivariate posterior distribution can be well estimated with a low-dimensional ABC analysis then this Gaussian copula method can extend ABC methods to problems of high dimension. The method also results in an analytic expression for the approximate posterior which is useful for many purposes such as approximation of the likelihood itself. This method is illustrated with several examples.
• Models for extremal dependence derived from skew-symmetric families(1507.00108)

April 18, 2016 stat.ME
Skew-symmetric families of distributions such as the skew-normal and skew-$t$ represent supersets of the normal and $t$ distributions, and they exhibit richer classes of extremal behaviour. By defining a non-stationary skew-normal process, which allows the easy handling of positive definite, non-stationary covariance functions, we derive a new family of max-stable processes - the extremal-skew-$t$ process. This process is a superset of non-stationary processes that include the stationary extremal-$t$ processes. We provide the spectral representation and the resulting angular densities of the extremal-skew-$t$ process, and illustrate its practical implementation (Includes Supporting Information).
• Modelling extremes using approximate Bayesian Computation(1411.1451)

Nov. 5, 2014 stat.CO, stat.ME
By the nature of their construction, many statistical models for extremes result in likelihood functions that are computationally prohibitive to evaluate. This is consequently problematic for the purposes of likelihood-based inference. With a focus on the Bayesian framework, this chapter examines the use of approximate Bayesian computation (ABC) techniques for the fitting and analysis of statistical models for extremes. After introducing the ideas behind ABC algorithms and methods, we demonstrate their application to extremal models in stereology and spatial extremes.
• Bayesian Symbol Detection in Wireless Relay Networks via Likelihood-Free Inference(1007.4603)

July 27, 2010 stat.AP
This paper presents a general stochastic model developed for a class of cooperative wireless relay networks, in which imperfect knowledge of the channel state information at the destination node is assumed. The framework incorporates multiple relay nodes operating under general known non-linear processing functions. When a non-linear relay function is considered, the likelihood function is generally intractable resulting in the maximum likelihood and the maximum a posteriori detectors not admitting closed form solutions. We illustrate our methodology to overcome this intractability under the example of a popular optimal non-linear relay function choice and demonstrate how our algorithms are capable of solving the previously intractable detection problem. Overcoming this intractability involves development of specialised Bayesian models. We develop three novel algorithms to perform detection for this Bayesian model, these include a Markov chain Monte Carlo Approximate Bayesian Computation (MCMC-ABC) approach; an Auxiliary Variable MCMC (MCMC-AV) approach; and a Suboptimal Exhaustive Search Zero Forcing (SES-ZF) approach. Finally, numerical examples comparing the symbol error rate (SER) performance versus signal to noise ratio (SNR) of the three detection algorithms are studied in simulated examples.
• Likelihood-based inference for max-stable processes(0902.3060)

Feb. 23, 2009 stat.ME
The last decade has seen max-stable processes emerge as a common tool for the statistical modeling of spatial extremes. However, their application is complicated due to the unavailability of the multivariate density function, and so likelihood-based methods remain far from providing a complete and flexible framework for inference. In this article we develop inferentially practical, likelihood-based methods for fitting max-stable processes derived from a composite-likelihood approach. The procedure is sufficiently reliable and versatile to permit the simultaneous modeling of marginal and dependence parameters in the spatial context at a moderate computational cost. The utility of this methodology is examined via simulation, and illustrated by the analysis of U.S. precipitation extremes.