• ### A Nonparametric Bayesian Methodology for Regression Discontinuity Designs(1704.04858)

Sept. 30, 2018 stat.ME
One of the most popular methodologies for estimating the average treatment effect at the threshold in a regression discontinuity design is local linear regression (LLR), which places larger weight on units closer to the threshold. We propose a Gaussian process regression methodology that acts as a Bayesian analog to LLR for regression discontinuity designs. Our methodology provides a flexible fit for treatment and control responses by placing a general prior on the mean response functions. Furthermore, unlike LLR, our methodology can incorporate uncertainty in how units are weighted when estimating the treatment effect. We prove our method is consistent in estimating the average treatment effect at the threshold. Furthermore, we find via simulation that our method exhibits promising coverage, interval length, and mean squared error properties compared to standard LLR and state-of-the-art LLR methodologies. Finally, we explore the performance of our method on a real-world example by studying the impact of being a first-round draft pick on the performance and playing time of basketball players in the National Basketball Association.
• ### Classifying X-ray Binaries: A Probabilistic Approach(1507.03538)

Aug. 17, 2018 stat.AP, stat.ML, astro-ph.HE
In X-ray binary star systems consisting of a compact object that accretes material from an orbiting secondary star, there is no straightforward means to decide if the compact object is a black hole or a neutron star. To assist this classification, we develop a Bayesian statistical model that makes use of the fact that X-ray binary systems appear to cluster based on their compact object type when viewed from a 3-dimensional coordinate system derived from X-ray spectral data. The first coordinate of this data is the ratio of counts in mid to low energy band (color 1), the second coordinate is the ratio of counts in high to low energy band (color 2), and the third coordinate is the sum of counts in all three bands. We use this model to estimate the probabilities that an X-ray binary system contains a black hole, non-pulsing neutron star, or pulsing neutron star. In particular, we utilize a latent variable model in which the latent variables follow a Gaussian process prior distribution, and hence we are able to induce the spatial correlation we believe exists between systems of the same type. The utility of this approach is evidenced by the accurate prediction of system types using Rossi X-ray Timing Explorer All Sky Monitor data, but it is not flawless. In particular, non-pulsing neutron systems containing "bursters" that are close to the boundary demarcating systems containing black holes tend to be classified as black hole systems. As a byproduct of our analyses, we provide the astronomer with public R code that can be used to predict the compact object type of X-ray binaries given training data.
• ### Convergence Results for a Class of Time-Varying Simulated Annealing Algorithms(1511.07304)

July 5, 2017 math.NA, math.PR, stat.CO
We provide a set of conditions which ensure the almost sure convergence of a class of simulated annealing algorithms on a bounded set $\mathcal{X}\subset\mathbb{R}^d$ based on a time-varying Markov kernel. The class of algorithms considered in this work encompasses the one studied in Belisle (1992) and Yang (2000) as well as its derandomized version recently proposed by Gerber and Bornn (2016). To the best of our knowledge, the results we derive are the first examples of almost sure convergence results for simulated annealing based on a time-varying kernel. In addition, the assumptions on the Markov kernel and on the cooling schedule have the advantage of being trivial to verify in practice.
• ### Learning Person Trajectory Representations for Team Activity Analysis(1706.00893)

June 3, 2017 cs.CV
Activity analysis in which multiple people interact across a large space is challenging due to the interplay of individual actions and collective group dynamics. We propose an end-to-end approach for learning person trajectory representations for group activity analysis. The learned representations encode rich spatio-temporal dependencies and capture useful motion patterns for recognizing individual events, as well as characteristic group dynamics that can be used to identify groups from their trajectories alone. We develop our deep learning approach in the context of team sports, which provide well-defined sets of events (e.g. pass, shot) and groups of people (teams). Analysis of events and team formations using NHL hockey and NBA basketball datasets demonstrate the generality of our approach.
• ### Meta-Analytics: Tools for Understanding the Statistical Properties of Sports Metrics(1609.09830)

Sept. 30, 2016 stat.AP
In sports, there is a constant effort to improve metrics which assess player ability, but there has been almost no effort to quantify and compare existing metrics. Any individual making a management, coaching, or gambling decision is quickly overwhelmed with hundreds of statistics. We address this problem by proposing a set of "meta-metrics" which can be used to identify the metrics that provide the most unique, reliable, and useful information for decision-makers. Specifically, we develop methods to evalute metrics based on three criteria: 1) stability: does the metric measure the same thing over time 2) discrimination: does the metric differentiate between players and 3) independence: does the metric provide new information? Our methods are easy to implement and widely applicable so they should be of interest to the broader sports community. We demonstrate our methods in analyses of both NBA and NHL metrics. Our results indicate the most reliable metrics and highlight how they should be used by sports analysts. The meta-metrics also provide useful insights about how to best construct new metrics which provide independent and reliable information about athletes.
• ### Improving Simulated Annealing through Derandomization(1505.03173)

Sept. 5, 2016 math.NA, stat.CO, math.OC
We propose and study a version of simulated annealing (SA) on continuous state spaces based on $(t,s)_R$-sequences. The parameter $R\in\bar{\mathbb{N}}$ regulates the degree of randomness of the input sequence, with the case $R=0$ corresponding to IID uniform random numbers and the limiting case $R=\infty$ to $(t,s)$-sequences. Our main result, obtained for rectangular domains, shows that the resulting optimization method, which we refer to as QMC-SA, converges almost surely to the global optimum of the objective function $\varphi$ for any $R\in\mathbb{N}$. When $\varphi$ is univariate, we are in addition able to show that the completely deterministic version of QMC-SA is convergent. A key property of these results is that they do not require objective-dependent conditions on the cooling schedule. As a corollary of our theoretical analysis, we provide a new almost sure convergence result for SA which shares this property under minimal assumptions on $\varphi$. We further explain how our results in fact apply to a broader class of optimization methods including for example threshold accepting, for which to our knowledge no convergence results currently exist. We finally illustrate the superiority of QMC-SA over SA algorithms in a numerical study.
• ### Adjusting for Scorekeeper Bias in NBA Box Scores(1602.08754)

Aug. 14, 2016 stat.AP
Box score statistics in the National Basketball Association are used to measure and evaluate player performance. Some of these statistics are subjective in nature and since box score statistics are recorded by scorekeepers hired by the home team for each game, there exists potential for inconsistency and bias. These inconsistencies can have far reaching consequences, particularly with the rise in popularity of daily fantasy sports. Using box score data, we estimate models able to quantify both the bias and the generosity of each scorekeeper for two of the most subjective statistics: assists and blocks. We then use optical player tracking data for the 2014-2015 season to improve the assist model by including other contextual spatio-temporal variables such as time of possession, player locations, and distance traveled. From this model, we present results measuring the impact of the scorekeeper and of the other contextual variables on the probability of a pass being recorded as an assist. Results for adjusting season assist totals to remove scorekeeper influence are also presented.
• ### Nonparametric hierarchical Bayesian quantiles(1605.03471)

May 11, 2016 stat.ME
Here we develop a method for performing nonparametric Bayesian inference on quantiles. Relying on geometric measure theory and employing a Hausdorff base measure, we are able to specify meaningful priors for the quantile while treating the distribution of the data otherwise nonparametrically. We further extend the method to a hierarchical model for quantiles of subpopulations, linking subgroups together solely through their quantiles. Our approach is computationally straightforward, allowing for censored and noisy data. We demonstrate the proposed methodology on simulated data and an applied problem from sports statistics, where it is observed to stabilize and improve inference and prediction.
• ### A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes(1408.0777)

Feb. 25, 2016 stat.CO, stat.AP
Basketball games evolve continuously in space and time as players constantly interact with their teammates, the opposing team, and the ball. However, current analyses of basketball outcomes rely on discretized summaries of the game that reduce such interactions to tallies of points, assists, and similar events. In this paper, we propose a framework for using optical player tracking data to estimate, in real time, the expected number of points obtained by the end of a possession. This quantity, called \textit{expected possession value} (EPV), derives from a stochastic process model for the evolution of a basketball possession; we model this process at multiple levels of resolution, differentiating between continuous, infinitesimal movements of players, and discrete events such as shot attempts and turnovers. Transition kernels are estimated using hierarchical spatiotemporal models that share information across players while remaining computationally tractable on very large data sets. In addition to estimating EPV, these models reveal novel insights on players' decision-making tendencies as a function of their spatial strategy.
• ### The Use of a Single Pseudo-Sample in Approximate Bayesian Computation(1404.6298)

Feb. 16, 2016 stat.CO, math.ST, stat.TH
We analyze the computational efficiency of approximate Bayesian computation (ABC), which approximates a likelihood function by drawing pseudo-samples from the associated model. For the rejection sampling version of ABC, it is known that multiple pseudo-samples cannot substantially increase (and can substantially decrease) the efficiency of the algorithm as compared to employing a high-variance estimate based on a single pseudo-sample. We show that this conclusion also holds for a Markov chain Monte Carlo version of ABC, implying that it is unnecessary to tune the number of pseudo-samples used in ABC-MCMC. This conclusion is in contrast to particle MCMC methods, for which increasing the number of particles can provide large gains in computational efficiency.
• ### Moment conditions and Bayesian nonparametrics(1507.08645)

Models phrased though moment conditions are central to much of modern inference. Here these moment conditions are embedded within a nonparametric Bayesian setup. Handling such a model is not probabilistically straightforward as the posterior has support on a manifold. We solve the relevant issues, building new probability and computational tools using Hausdorff measures to analyze them on real and simulated data. These new methods which involve simulating on a manifold can be applied widely, including providing Bayesian analysis of quasi-likelihoods, linear and nonlinear regression, missing data and hierarchical models.
• ### FastGP: An R Package for Gaussian Processes(1507.06055)

July 22, 2015 stat.CO
Despite their promise and ubiquity, Gaussian processes (GPs) can be difficult to use in practice due to the computational impediments of fitting and sampling from them. Here we discuss a short R package for efficient multivariate normal functions which uses the Rcpp and RcppEigen packages at its core. GPs have properties that allow standard functions to be sped up; as an example we include functionality for Toeplitz matrices whose inverse can be computed in O(n^2) time with methods due to Trench and Durbin (Golub & Van Loan 1996), which is particularly apt when time points (or spatial locations) of a Gaussian process are evenly spaced, since the associated covariance matrix is Toeplitz in this case. Additionally, we include functionality to sample from a latent variable Gaussian process model with elliptical slice sampling (Murray, Adams, & MacKay 2010).
• ### Characterizing the spatial structure of defensive skill in professional basketball(1405.0231)

May 28, 2015 stat.AP
Although basketball is a dualistic sport, with all players competing on both offense and defense, almost all of the sport's conventional metrics are designed to summarize offensive play. As a result, player valuations are largely based on offensive performances and to a much lesser degree on defensive ones. Steals, blocks and defensive rebounds provide only a limited summary of defensive effectiveness, yet they persist because they summarize salient events that are easy to observe. Due to the inefficacy of traditional defensive statistics, the state of the art in defensive analytics remains qualitative, based on expert intuition and analysis that can be prone to human biases and imprecision. Fortunately, emerging optical player tracking systems have the potential to enable a richer quantitative characterization of basketball performance, particularly defensive performance. Unfortunately, due to computational and methodological complexities, that potential remains unmet. This paper attempts to fill this void, combining spatial and spatio-temporal processes, matrix factorization techniques and hierarchical regression models with player tracking data to advance the state of defensive analytics in the NBA. Our approach detects, characterizes and quantifies multiple aspects of defensive play in basketball, supporting some common understandings of defensive effectiveness, challenging others and opening up many new insights into the defensive elements of basketball.
• ### Fast and optimal nonparametric sequential design for astronomical observations(1501.02467)

Jan. 11, 2015 stat.ME, stat.AP, stat.ML
The spectral energy distribution (SED) is a relatively easy way for astronomers to distinguish between different astronomical objects such as galaxies, black holes, and stellar objects. By comparing the observations from a source at different frequencies with template models, astronomers are able to infer the type of this observed object. In this paper, we take a Bayesian model averaging perspective to learn astronomical objects, employing a Bayesian nonparametric approach to accommodate the deviation from convex combinations of known log-SEDs. To effectively use telescope time for observations, we then study Bayesian nonparametric sequential experimental design without conjugacy, in which we use sequential Monte Carlo as an efficient tool to maximize the volume of information stored in the posterior distribution of the parameters of interest. A new technique for performing inferences in log-Gaussian Cox processes called the Poisson log-normal approximation is also proposed. Simulations show the speed, accuracy, and usefulness of our method. While the strategy we propose in this paper is brand new in the astronomy literature, the inferential techniques developed apply to more general nonparametric sequential experimental design problems.
• ### Diversifying Sparsity Using Variational Determinantal Point Processes(1411.6307)

Nov. 23, 2014 cs.AI, cs.LG, stat.ML
We propose a novel diverse feature selection method based on determinantal point processes (DPPs). Our model enables one to flexibly define diversity based on the covariance of features (similar to orthogonal matching pursuit) or alternatively based on side information. We introduce our approach in the context of Bayesian sparse regression, employing a DPP as a variational approximation to the true spike and slab posterior distribution. We subsequently show how this variational DPP approximation generalizes and extends mean-field approximation, and can be learned efficiently by exploiting the fast sampling properties of DPPs. Our motivating application comes from bioinformatics, where we aim to identify a diverse set of genes whose expression profiles predict a tumor type where the diversity is defined with respect to a gene-gene interaction network. We also explore an application in spatial statistics. In both cases, we demonstrate that the proposed method yields significantly more diverse feature sets than classic sparse methods, without compromising accuracy.
• ### Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball(1401.0942)

Jan. 8, 2014 stat.AP, stat.ML
We develop a machine learning approach to represent and analyze the underlying spatial structure that governs shot selection among professional basketball players in the NBA. Typically, NBA players are discussed and compared in an heuristic, imprecise manner that relies on unmeasured intuitions about player behavior. This makes it difficult to draw comparisons between players and make accurate player specific predictions. Modeling shot attempt data as a point process, we create a low dimensional representation of offensive player types in the NBA. Using non-negative matrix factorization (NMF), an unsupervised dimensionality reduction technique, we show that a low-rank spatial decomposition summarizes the shooting habits of NBA players. The spatial representations discovered by the algorithm correspond to intuitive descriptions of NBA player types, and can be used to model other spatial effects, such as shooting accuracy.
• ### Sequential Monte Carlo Bandits(1310.1404)

Oct. 4, 2013 stat.ME, cs.LG, stat.ML
In this paper we propose a flexible and efficient framework for handling multi-armed bandits, combining sequential Monte Carlo algorithms with hierarchical Bayesian modeling techniques. The framework naturally encompasses restless bandits, contextual bandits, and other bandit variants under a single inferential model. Despite the model's generality, we propose efficient Monte Carlo algorithms to make inference scalable, based on recent developments in sequential Monte Carlo methods. Through two simulation studies, the framework is shown to outperform other empirical methods, while also naturally scaling to more complex problems for which existing approaches can not cope. Additionally, we successfully apply our framework to online video-based advertising recommendation, and show its increased efficacy as compared to current state of the art bandit algorithms.
• ### PAWL-Forced Simulated Tempering(1305.5017)

May 22, 2013 stat.CO, stat.ML
In this short note, we show how the parallel adaptive Wang-Landau (PAWL) algorithm of Bornn et al. (2013) can be used to automate and improve simulated tempering algorithms. While Wang-Landau and other stochastic approximation methods have frequently been applied within the simulated tempering framework, this note demonstrates through a simple example the additional improvements brought about by parallelization, adaptive proposals and automated bin splitting.
• ### Herded Gibbs Sampling(1301.4168)

March 16, 2013 stat.CO, cs.LG, stat.ML
The Gibbs sampler is one of the most popular algorithms for inference in statistical models. In this paper, we introduce a herding variant of this algorithm, called herded Gibbs, that is entirely deterministic. We prove that herded Gibbs has an $O(1/T)$ convergence rate for models with independent variables and for fully connected probabilistic graphical models. Herded Gibbs is shown to outperform Gibbs in the tasks of image denoising with MRFs and named entity recognition with CRFs. However, the convergence for herded Gibbs for sparsely connected probabilistic graphical models is still an open problem.
• ### An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration(1109.3829)

June 14, 2012 stat.CO, stat.ME, stat.AP
While statisticians are well-accustomed to performing exploratory analysis in the modeling stage of an analysis, the notion of conducting preliminary general-purpose exploratory analysis in the Monte Carlo stage (or more generally, the model-fitting stage) of an analysis is an area which we feel deserves much further attention. Towards this aim, this paper proposes a general-purpose algorithm for automatic density exploration. The proposed exploration algorithm combines and expands upon components from various adaptive Markov chain Monte Carlo methods, with the Wang-Landau algorithm at its heart. Additionally, the algorithm is run on interacting parallel chains -- a feature which both decreases computational cost as well as stabilizes the algorithm, improving its ability to explore the density. Performance is studied in several applications. Through a Bayesian variable selection example, the authors demonstrate the convergence gains obtained with interacting chains. The ability of the algorithm's adaptive proposal to induce mode-jumping is illustrated through a trimodal density and a Bayesian mixture modeling application. Lastly, through a 2D Ising model, the authors demonstrate the ability of the algorithm to overcome the high correlations encountered in spatial models.
• ### Forecasting with Historical Data or Process Knowledge under Misspecification: A Comparison(1205.3845)

May 17, 2012 stat.CO, stat.ME, stat.OT
When faced with the task of forecasting a dynamic system, practitioners often have available historical data, knowledge of the system, or a combination of both. While intuition dictates that perfect knowledge of the system should in theory yield perfect forecasting, often knowledge of the system is only partially known, known up to parameters, or known incorrectly. In contrast, forecasting using previous data without any process knowledge might result in accurate prediction for simple systems, but will fail for highly nonlinear and chaotic systems. In this paper, the authors demonstrate how even in chaotic systems, forecasting with historical data is preferable to using process knowledge if this knowledge exhibits certain forms of misspecification. Through an extensive simulation study, a range of misspecification and forecasting scenarios are examined with the goal of gaining an improved understanding of the circumstances under which forecasting from historical data is to be preferred over using process knowledge.
• ### Bayesian clustering in decomposable graphs(1005.5081)

May 3, 2012 stat.ME, stat.AP, stat.ML
In this paper we propose a class of prior distributions on decomposable graphs, allowing for improved modeling flexibility. While existing methods solely penalize the number of edges, the proposed work empowers practitioners to control clustering, level of separation, and other features of the graph. Emphasis is placed on a particular prior distribution which derives its motivation from the class of product partition models; the properties of this prior relative to existing priors is examined through theory and simulation. We then demonstrate the use of graphical models in the field of agriculture, showing how the proposed prior distribution alleviates the inflexibility of previous approaches in properly modeling the interactions between the yield of different crop varieties.
• ### Sparsity-Promoting Bayesian Dynamic Linear Models(1203.0106)

March 1, 2012 stat.CO, stat.ME, stat.ML
Sparsity-promoting priors have become increasingly popular over recent years due to an increased number of regression and classification applications involving a large number of predictors. In time series applications where observations are collected over time, it is often unrealistic to assume that the underlying sparsity pattern is fixed. We propose here an original class of flexible Bayesian linear models for dynamic sparsity modelling. The proposed class of models expands upon the existing Bayesian literature on sparse regression using generalized multivariate hyperbolic distributions. The properties of the models are explored through both analytic results and simulation studies. We demonstrate the model on a financial application where it is shown that it accurately represents the patterns seen in the analysis of stock and derivative data, and is able to detect major events by filtering an artificial portfolio of assets.
• ### Modeling Non-Stationary Processes Through Dimension Expansion(1011.2553)

June 2, 2011 stat.ME, stat.AP
In this paper, we propose a novel approach to modeling nonstationary spatial fields. The proposed method works by expanding the geographic plane over which these processes evolve into higher dimensional spaces, transforming and clarifying complex patterns in the physical plane. By combining aspects of multi-dimensional scaling, group lasso, and latent variables models, a dimensionally sparse projection is found in which the originally nonstationary field exhibits stationarity. Following a comparison with existing methods in a simulated environment, dimension expansion is studied on a classic test-bed data set historically used to study nonstationary models. Following this, we explore the use of dimension expansion in modeling air pollution in the United Kingdom, a process known to be strongly influenced by rural/urban effects, amongst others, which gives rise to a nonstationary field.
• ### Discussion of "Riemann manifold Langevin and Hamiltonian Monte Carlo methods'' by M. Girolami and B. Calderhead(1011.0057)

Oct. 30, 2010 stat.CO, stat.ME, stat.ML
This technical report is the union of two contributions to the discussion of the Read Paper "Riemann manifold Langevin and Hamiltonian Monte Carlo methods" by B. Calderhead and M. Girolami, presented in front of the Royal Statistical Society on October 13th 2010 and to appear in the Journal of the Royal Statistical Society Series B. The first comment establishes a parallel and possible interactions with Adaptive Monte Carlo methods. The second comment exposes a detailed study of Riemannian Manifold Hamiltonian Monte Carlo (RMHMC) for a weakly identifiable model presenting a strong ridge in its geometry.