• We present measurements of the normalised redshift-space three-point correlation function (Q_z) of galaxies from the Sloan Digital Sky Survey (SDSS) main galaxy sample. We have applied our "npt" algorithm to both a volume-limited (36738 galaxies) and magnitude-limited sample (134741 galaxies) of SDSS galaxies, and find consistent results between the two samples, thus confirming the weak luminosity dependence of Q_z recently seen by other authors. We compare our results to other Q_z measurements in the literature and find it to be consistent within the full jack-knife error estimates. However, we find these errors are significantly increased by the presence of the ``Sloan Great Wall'' (at z ~ 0.08) within these two SDSS datasets, which changes the 3-point correlation function (3PCF) by 70% on large scales (s>=10h^-1 Mpc). If we exclude this supercluster, our observed Q_z is in better agreement with that obtained from the 2dFGRS by other authors, thus demonstrating the sensitivity of these higher-order correlation functions to large-scale structures in the Universe. This analysis highlights that the SDSS datasets used here are not ``fair samples'' of the Universe for the estimation of higher-order clustering statistics and larger volumes are required. We study the shape-dependence of Q_z(s,q,theta) as one expects this measurement to depend on scale if the large scale structure in the Universe has grown via gravitational instability from Gaussian initial conditions. On small scales (s <= 6h^-1 Mpc), we see some evidence for shape-dependence in Q_z, but at present our measurements are consistent with a constant within the errors (Q_z ~ 0.75 +/- 0.05). On scales >10h^-1 Mpc, we see considerable shape-dependence in Q_z.
  • I present here a review of past and present multi-disciplinary research of the Pittsburgh Computational AstroStatistics (PiCA) group. This group is dedicated to developing fast and efficient statistical algorithms for analysing huge astronomical data sources. I begin with a short review of multi-resolutional kd-trees which are the building blocks for many of our algorithms. For example, quick range queries and fast n-point correlation functions. I will present new results from the use of Mixture Models (Connolly et al. 2000) in density estimation of multi-color data from the Sloan Digital Sky Survey (SDSS). Specifically, the selection of quasars and the automated identification of X-ray sources. I will also present a brief overview of the False Discovery Rate (FDR) procedure (Miller et al. 2001a) and show how it has been used in the detection of ``Baryon Wiggles'' in the local galaxy power spectrum and source identification in radio data. Finally, I will look forward to new research on an automated Bayes Network anomaly detector and the possible use of the Locally Linear Embedding algorithm (LLE; Roweis & Saul 2000) for spectral classification of SDSS spectra.
  • In this paper, we outline the use of Mixture Models in density estimation of large astronomical databases. This method of density estimation has been known in Statistics for some time but has not been implemented because of the large computational cost. Herein, we detail an implementation of the Mixture Model density estimation based on multi-resolutional KD-trees which makes this statistical technique into a computationally tractable problem. We provide the theoretical and experimental background for using a mixture model of Gaussians based on the Expectation Maximization (EM) Algorithm. Applying these analyses to simulated data sets we show that the EM algorithm - using the AIC penalized likelihood to score the fit - out-performs the best kernel density estimate of the distribution while requiring no ``fine--tuning'' of the input algorithm parameters. We find that EM can accurately recover the underlying density distribution from point processes thus providing an efficient adaptive smoothing method for astronomical source catalogs. To demonstrate the general application of this statistic to astrophysical problems we consider two cases of density estimation: the clustering of galaxies in redshift space and the clustering of stars in color space. From these data we show that EM provides an adaptive smoothing of the distribution of galaxies in redshift space (describing accurately both the small and large-scale features within the data) and a means of identifying outliers in multi-dimensional color-color space (e.g. for the identification of high redshift QSOs). Automated tools such as those based on the EM algorithm will be needed in the analysis of the next generation of astronomical catalogs (2MASS, FIRST, PLANCK, SDSS) and ultimately in in the development of the National Virtual Observatory.
  • We present initial results on the use of Mixture Models for density estimation in large astronomical databases. We provide herein both the theoretical and experimental background for using a mixture model of Gaussians based on the Expectation Maximization (EM) Algorithm. Applying these analyses to simulated data sets we show that the EM algorithm - using the both the AIC & BIC penalized likelihood to score the fit - can out-perform the best kernel density estimate of the distribution while requiring no ``fine-tuning'' of the input algorithm parameters. We find that EM can accurately recover the underlying density distribution from point processes thus providing an efficient adaptive smoothing method for astronomical source catalogs. To demonstrate the general application of this statistic to astrophysical problems we consider two cases of density estimation; the clustering of galaxies in redshift space and the clustering of stars in color space. From these data we show that EM provides an adaptive smoothing of the distribution of galaxies in redshift space (describing accurately both the small and large-scale features within the data) and a means of identifying outliers in multi-dimensional color-color space (e.g. for the identification of high redshift QSOs). Automated tools such as those based on the EM algorithm will be needed in the analysis of the next generation of astronomical catalogs (2MASS, FIRST, PLANCK, SDSS) and ultimately in the development of the National Virtual Observatory.
  • This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.