• ### Dynamic Shrinkage Processes(1707.00763)

Feb. 24, 2018 stat.ME
We propose a novel class of dynamic shrinkage processes for Bayesian time series and regression analysis. Building upon a global-local framework of prior construction, in which continuous scale mixtures of Gaussian distributions are employed for both desirable shrinkage properties and computational tractability, we model dependence among the local scale parameters. The resulting processes inherit the desirable shrinkage behavior of popular global-local priors, such as the horseshoe prior, but provide additional localized adaptivity, which is important for modeling time series data or regression functions with local features. We construct a computationally efficient Gibbs sampling algorithm based on a P\'olya-Gamma scale mixture representation of the proposed process. Using dynamic shrinkage processes, we develop a Bayesian trend filtering model that produces more accurate estimates and tighter posterior credible intervals than competing methods, and apply the model for irregular curve-fitting of minute-by-minute Twitter CPU usage data. In addition, we develop an adaptive time-varying parameter regression model to assess the efficacy of the Fama-French five-factor asset pricing model with momentum added as a sixth factor. Our dynamic analysis of manufacturing and healthcare industry data shows that with the exception of the market risk, no other risk factors are significant except for brief periods.
• ### Linear Non-Gaussian Component Analysis via Maximum Likelihood(1511.01609)

Oct. 2, 2017 stat.CO, stat.ME
Independent component analysis (ICA) is popular in many applications, including cognitive neuroscience and signal processing. Due to computational constraints, principal component analysis is used for dimension reduction prior to ICA (PCA+ICA), which could remove important information. The problem is that interesting independent components (ICs) could be mixed in several principal components that are discarded and then these ICs cannot be recovered. We formulate a linear non-Gaussian component model with Gaussian noise components. To estimate this model, we propose likelihood component analysis (LCA), in which dimension reduction and latent variable estimation are achieved simultaneously. Our method orders components by their marginal likelihood rather than ordering components by variance as in PCA. We present a parametric LCA using the logistic density and a semi-parametric LCA using tilted Gaussians with cubic B-splines. Our algorithm is scalable to datasets common in applications (e.g., hundreds of thousands of observations across hundreds of variables with dozens of latent components). In simulations, latent components are recovered that are discarded by PCA+ICA methods. We apply our method to multivariate data and demonstrate that LCA is a useful data visualization and dimension reduction tool that reveals features not apparent from PCA or PCA+ICA. We also apply our method to an fMRI experiment from the Human Connectome Project and identify artifacts missed by PCA+ICA. We present theoretical results on identifiability of the linear non-Gaussian component model and consistency of LCA.
• ### Bayesian Functional Generalized Additive Models with Sparsely Observed Covariates(1305.3585)

May 26, 2017 stat.CO, stat.ME
The functional generalized additive model (FGAM) was recently proposed in McLean et al. (2013) as a more flexible alternative to the common functional linear model (FLM) for regressing a scalar on functional covariates. In this paper, we develop a Bayesian version of FGAM for the case of Gaussian errors with identity link function. Our approach allows the functional covariates to be sparsely observed and measured with error, whereas the estimation procedure of McLean et al. (2013) required that they be noiselessly observed on a regular grid. We consider both Monte Carlo and variational Bayes methods for fitting the FGAM with sparsely observed covariates. Due to the complicated form of the model posterior distribution and full conditional distributions, standard Monte Carlo and variational Bayes algorithms cannot be used. The strategies we use to handle the updating of parameters without closed-form full conditionals should be of independent interest to applied Bayesian statisticians working with nonconjugate models. Our numerical studies demonstrate the benefits of our algorithms over a two-step approach of first recovering the complete trajectories using standard techniques and then fitting a functional regression model. In a real data analysis, our methods are applied to forecasting closing price for items up for auction on the online auction website eBay.
• ### Profile Estimation for Partial Functional Partially Linear Single-Index Model(1703.02736)

March 8, 2017 math.ST, stat.TH, stat.ME
This paper studies a \textit{partial functional partially linear single-index model} that consists of a functional linear component as well as a linear single-index component. This model generalizes many well-known existing models and is suitable for more complicated data structures. However, its estimation inherits the difficulties and complexities from both components and makes it a challenging problem, which calls for new methodology. We propose a novel profile B-spline method to estimate the parameters by approximating the unknown nonparametric link function in the single-index component part with B-spline, while the linear slope function in the functional component part is estimated by the functional principal component basis. The consistency and asymptotic normality of the parametric estimators are derived, and the global convergence of the proposed estimator of the linear slope function is also established. More excitingly, the latter convergence is optimal in the minimax sense. A two-stage procedure is implemented to estimate the nonparametric link function, and the resulting estimator possesses the optimal global rate of convergence. Furthermore, the convergence rate of the mean squared prediction error for a predictor is also obtained. Empirical properties of the proposed procedures are studied through Monte Carlo simulations. A real data example is also analyzed to illustrate the power and flexibility of the proposed methodology.

Dec. 14, 2016 stat.ME
We study additive function-on-function regression where the mean response at a particular time point depends on the time point itself as well as the entire covariate trajectory. We develop a computationally efficient estimation methodology based on a novel combination of spline bases with an eigenbasis to represent the trivariate kernel function. We discuss prediction of a new response trajectory, propose an inference procedure that accounts for total variability in the predicted response curves, and construct pointwise prediction intervals. The estimation/inferential procedure accommodates realistic scenarios such as correlated error structure as well as sparse and/or irregular designs. We investigate our methodology in finite sample size through simulations and two real data applications.
• ### Functional Autoregression for Sparsely Sampled Data(1603.02982)

Oct. 19, 2016 stat.ME
We develop a hierarchical Gaussian process model for forecasting and inference of functional time series data. Unlike existing methods, our approach is especially suited for sparsely or irregularly sampled curves and for curves sampled with non-negligible measurement error. The latent process is dynamically modeled as a functional autoregression (FAR) with Gaussian process innovations. We propose a fully nonparametric dynamic functional factor model for the dynamic innovation process, with broader applicability and improved computational efficiency over standard Gaussian process models. We prove finite-sample forecasting and interpolation optimality properties of the proposed model, which remain valid with the Gaussian assumption relaxed. An efficient Gibbs sampling algorithm is developed for estimation, inference, and forecasting, with extensions for FAR(p) models with model averaging over the lag p. Extensive simulations demonstrate substantial improvements in forecasting performance and recovery of the autoregressive surface over competing methods, especially under sparse designs. We apply the proposed methods to forecast nominal and real yield curves using daily U.S. data. Real yields are observed more sparsely than nominal yields, yet the proposed methods are highly competitive in both settings.
• ### Simultaneously modelling far-infrared dust emission and its relation to CO emission in star forming galaxies(1509.00639)

April 19, 2016 astro-ph.GA
We present a method to simultaneously model the dust far-infrared spectral energy distribution (SED) and the total infrared $-$ carbon monoxide (CO) integrated intensity $(S_{\rm IR}-I_{\rm CO})$ relationship. The modelling employs a hierarchical Bayesian (HB) technique to estimate the dust surface density, temperature ($T_{\rm eff}$), and spectral index at each pixel from the observed far-infrared (FIR) maps. Additionally, given the corresponding CO map, the method simultaneously estimates the slope and intercept between the FIR and CO intensities, which are global properties of the observed source. The model accounts for correlated and uncorrelated uncertainties, such as those present in Herschel observations. Using synthetic datasets, we demonstrate the accuracy of the HB method, and contrast the results with common non-hierarchical fitting methods. As an initial application, we model the dust and gas on 100 pc scales in the Magellanic Clouds from Herschel FIR and NANTEN CO observations. The slopes of the $\log S_{\rm IR}-\log I_{\rm CO}$ relationship are similar in both galaxies, falling in the range 1.1$-$1.7. However, in the SMC the intercept is nearly 3 times higher, which can be explained by its lower metallicity than the LMC, resulting in a larger $S_{\rm IR}$ per unit $I_{\rm CO}$. The HB modelling evidences an increase in $T_{\rm eff}$ in regions with the highest $I_{\rm CO}$ in the LMC. This may be due to enhanced dust heating in the densest molecular regions from young stars. Such simultaneous dust and gas modelling may reveal variations in the properties of the ISM and its association with other galactic characteristics, such as star formation rates and/or metallicities.
• ### A Bayesian Multivariate Functional Dynamic Linear Model(1411.0764)

Aug. 5, 2015 stat.ME
We present a Bayesian approach for modeling multivariate, dependent functional data. To account for the three dominant structural features in the data--functional, time dependent, and multivariate components--we extend hierarchical dynamic linear models for multivariate time series to the functional data setting. We also develop Bayesian spline theory in a more general constrained optimization framework. The proposed methods identify a time-invariant functional basis for the functional observations, which is smooth and interpretable, and can be made common across multivariate observations for additional information sharing. The Bayesian framework permits joint estimation of the model parameters, provides exact inference (up to MCMC error) on specific parameters, and allows generalized dependence structures. Sampling from the posterior distribution is accomplished with an efficient Gibbs sampling algorithm. We illustrate the proposed framework with two applications: (1) multi-economy yield curve data from the recent global recession, and (2) local field potential brain signals in rats, for which we develop a multivariate functional time series approach for multivariate time-frequency analysis. Supplementary materials, including R code and the multi-economy yield curve data, are available online.
• ### RAPTT: An Exact Two-Sample Test in High Dimensions Using Random Projections(1405.1792)

May 8, 2014 stat.CO, stat.ME
In high dimensions, the classical Hotelling's $T^2$ test tends to have low power or becomes undefined due to singularity of the sample covariance matrix. In this paper, this problem is overcome by projecting the data matrix onto lower dimensional subspaces through multiplication by random matrices. We propose RAPTT (RAndom Projection T-Test), an exact test for equality of means of two normal populations based on projected lower dimensional data. RAPTT does not require any constraints on the dimension of the data or the sample size. A simulation study indicates that in high dimensions the power of this test is often greater than that of competing tests. The advantage of RAPTT is illustrated on high-dimensional gene expression data involving the discrimination of tumor and normal colon tissues.
• ### Fast Covariance Estimation for High-dimensional Functional Data(1306.5718)

Feb. 26, 2014 stat.ME
For smoothing covariance functions, we propose two fast algorithms that scale linearly with the number of observations per function. Most available methods and software cannot smooth covariance matrices of dimension $J \times J$ with $J>500$; the recently introduced sandwich smoother is an exception, but it is not adapted to smooth covariance matrices of large dimensions such as $J \ge 10,000$. Covariance matrices of order $J=10,000$, and even $J=100,000$, are becoming increasingly common, e.g., in 2- and 3-dimensional medical imaging and high-density wearable sensor data. We introduce two new algorithms that can handle very large covariance matrices: 1) FACE: a fast implementation of the sandwich smoother and 2) SVDS: a two-step procedure that first applies singular value decomposition to the data matrix and then smoothes the eigenvectors. Compared to existing techniques, these new algorithms are at least an order of magnitude faster in high dimensions and drastically reduce memory requirements. The new algorithms provide instantaneous (few seconds) smoothing for matrices of dimension $J=10,000$ and very fast ($<$ 10 minutes) smoothing for $J=100,000$. Although SVDS is simpler than FACE, we provide ready to use, scalable R software for FACE. When incorporated into R package {\it refund}, FACE improves the speed of penalized functional regression by an order of magnitude, even for data of normal size ($J <500$). We recommend that FACE be used in practice for the analysis of noisy and high-dimensional functional data.
• ### Multilevel Bayesian framework for modeling the production, propagation and detection of ultra-high energy cosmic rays(1206.4569)

Nov. 28, 2013 stat.AP, astro-ph.IM, astro-ph.HE
Ultra-high energy cosmic rays (UHECRs) are atomic nuclei with energies over ten million times energies accessible to human-made particle accelerators. Evidence suggests that they originate from relatively nearby extragalactic sources, but the nature of the sources is unknown. We develop a multilevel Bayesian framework for assessing association of UHECRs and candidate source populations, and Markov chain Monte Carlo algorithms for estimating model parameters and comparing models by computing, via Chib's method, marginal likelihoods and Bayes factors. We demonstrate the framework by analyzing measurements of 69 UHECRs observed by the Pierre Auger Observatory (PAO) from 2004-2009, using a volume-complete catalog of 17 local active galactic nuclei (AGN) out to 15 megaparsecs as candidate sources. An early portion of the data ("period 1," with 14 events) was used by PAO to set an energy cut maximizing the anisotropy in period 1; the 69 measurements include this "tuned" subset, and subsequent "untuned" events with energies above the same cutoff. Also, measurement errors are approximately summarized. These factors are problematic for independent analyses of PAO data. Within the context of "standard candle" source models (i.e., with a common isotropic emission rate), and considering only the 55 untuned events, there is no significant evidence favoring association of UHECRs with local AGN vs. an isotropic background. The highest-probability associations are with the two nearest, adjacent AGN, Centaurus A and NGC 4945. If the association model is adopted, the fraction of UHECRs that may be associated is likely nonzero but is well below 50%. Our framework enables estimation of the angular scale for deflection of cosmic rays by cosmic magnetic fields; relatively modest scales of $\approx\!3^{\circ}$ to $30^{\circ}$ are favored. Models that assign a large fraction of UHECRs to a single nearby source (e.g., Centaurus A) are ruled out unless very large deflection scales are specified a priori, and even then they are disfavored. However, including the period 1 data alters the conclusions significantly, and a simulation study supports the idea that the period 1 data are anomalous, presumably due to the tuning. Accurate and optimal analysis of future data will likely require more complete disclosure of the data.
• ### Restricted Likelihood Ratio Tests for Linearity in Scalar-on-Function Regression(1310.5811)

Oct. 22, 2013 stat.ME
We propose a procedure for testing the linearity of a scalar-on-function regression relationship. To do so, we use the functional generalized additive model (FGAM), a recently developed extension of the functional linear model. For a functional covariate X(t), the FGAM models the mean response as the integral with respect to t of F{X(t),t} where F is an unknown bivariate function. The FGAM can be viewed as the natural functional extension of generalized additive models. We show how the functional linear model can be represented as a simple mixed model nested within the FGAM. Using this representation, we then consider restricted likelihood ratio tests for zero variance components in mixed models to test the null hypothesis that the functional linear model holds. The methods are general and can also be applied to testing for interactions in a multivariate additive model or for testing for no effect in the functional linear model. The performance of the proposed tests is assessed on simulated data and in an application to measuring diesel truck emissions, where strong evidence of nonlinearities in the relationship between the functional predictor and the response are found.
• ### Optimal Prediction in an Additive Functional Model(1301.4954)

Jan. 21, 2013 math.ST, stat.TH, stat.ME
The functional generalized additive model (FGAM) provides a more flexible nonlinear functional regression model than the well-studied functional linear regression model. This paper restricts attention to the FGAM with identity link and additive errors, which we will call the additive functional model, a generalization of the functional linear model. This paper studies the minimax rate of convergence of predictions from the additive functional model in the framework of reproducing kernel Hilbert space. It is shown that the optimal rate is determined by the decay rate of the eigenvalues of a specific kernel function, which in turn is determined by the reproducing kernel and the joint distribution of any two points in the random predictor function. For the special case of the functional linear model, this kernel function is jointly determined by the covariance function of the predictor function and the reproducing kernel. The easily implementable roughness-regularized predictor is shown to achieve the optimal rate of convergence. Numerical studies are carried out to illustrate the merits of the predictor. Our simulations and real data examples demonstrate a competitive performance against the existing approach.
• ### Fast Bivariate Penalized Splines: the Sandwich Smoother(1011.4916)

July 13, 2012 math.ST, stat.TH, stat.ME
We propose a fast penalized spline method for bivariate smoothing. Univariate P-spline smoothers (Eilers and Marx, 1996) are applied simultaneously along both coordinates. The new smoother has a sandwich form which suggested the name "sandwich smoother" to a referee. The sandwich smoother has a tensor product structure that simplifies an asymptotic analysis and it can be fast computed. We derive a local central limit theorem for the sandwich smoother, with simple expressions for the asymptotic bias and variance, by showing that the sandwich smoother is asymptotically equivalent to a bivariate kernel regression estimator with a product kernel. As far as we are aware, this is the first central limit theorem for a bivariate spline estimator of any type. Our simulation study shows that the sandwich smoother is orders of magnitude faster to compute than other bivariate spline smoothers, even when the latter are computed using a fast GLAM (Generalized Linear Array Model) algorithm, and comparable to them in terms of mean squared integrated errors. We extend the sandwich smoother to array data of higher dimensions, where a GLAM algorithm improves the computational speed of the sandwich smoother. One important application of the sandwich smoother is to estimate covariance functions in functional data analysis. In this application, our numerical results show that the sandwich smoother is orders of magnitude faster than local linear regression. The speed of the sandwich formula is important because functional data sets are becoming quite large.
• ### Guilt by Association: Finding Cosmic Ray Sources Using Hierarchical Bayesian Clustering(1206.3540)

June 15, 2012 stat.AP, astro-ph.IM, astro-ph.HE
The Earth is continuously showered by charged cosmic ray particles, naturally produced atomic nuclei moving with velocity close to the speed of light. Among these are ultra high energy cosmic ray particles with energy exceeding 5x10^19 eV, which is ten million times more energetic than the most energetic particles produced at the Large Hadron Collider. Astrophysical questions include: what phenomenon accelerates particles to such high energies, and what sort of nuclei are energized? Also, the magnetic deflection of the trajectories of the cosmic rays makes them potential probes of galactic and intergalactic magnetic fields. We develop a Bayesian hierarchical model that can be used to compare different association models between the cosmic rays and source population, using Bayes factors. A measurement model with directional uncertainties and accounting for non-uniform sky exposure is incoporated into the model. The methodology allows us to learn about astrophysical parameters, such as those governing the source luminosity function and the cosmic magnetic field.
• ### Local Asymptotics of P-splines(1201.0708)

June 10, 2012 math.ST, stat.TH
This report studies local asymptotics of P-splines with $p$th degree B-splines and a $m$th order difference penalty. Earlier work with $p$ and $m$ restricted is extended to the general case. Asymptotically, penalized splines are kernel estimators with equivalent kernels depending on $m$, but not on $p$. A central limit theorem provides simple expressions for the asymptotic mean and variance. Provided it is fast enough, the divergence rate of the number of knots does not affect the asymptotic distribution. The optimal convergence rate of the penalty parameter is given.
• ### Local Asymptotics of P-Spline Smoothing(0912.1824)

Dec. 9, 2009 math.ST, stat.TH
This paper addresses asymptotic properties of general penalized spline estimators with an arbitrary B-spline degree and an arbitrary order difference penalty. The estimator is approximated by a solution of a linear differential equation subject to suitable boundary conditions. It is shown that, in certain sense, the penalized smoothing corresponds approximately to smoothing by the kernel method. The equivalent kernels for both inner points and boundary points are obtained with the help of Green's functions of the differential equation. Further, the asymptotic normality is established for the estimator at interior points. It is shown that the convergence rate is independent of the degree of the splines, and the number of knots does not affect the asymptotic distribution, provided that it tends to infinity fast enough.
• ### Discussion: Conditional growth charts(math/0702636)

Feb. 22, 2007 math.ST, stat.TH
Discussion of Conditional growth charts [math.ST/0702634]