-
We present measurements of the normalised redshift-space three-point
correlation function (Q_z) of galaxies from the Sloan Digital Sky Survey (SDSS)
main galaxy sample. We have applied our "npt" algorithm to both a
volume-limited (36738 galaxies) and magnitude-limited sample (134741 galaxies)
of SDSS galaxies, and find consistent results between the two samples, thus
confirming the weak luminosity dependence of Q_z recently seen by other
authors. We compare our results to other Q_z measurements in the literature and
find it to be consistent within the full jack-knife error estimates. However,
we find these errors are significantly increased by the presence of the ``Sloan
Great Wall'' (at z ~ 0.08) within these two SDSS datasets, which changes the
3-point correlation function (3PCF) by 70% on large scales (s>=10h^-1 Mpc). If
we exclude this supercluster, our observed Q_z is in better agreement with that
obtained from the 2dFGRS by other authors, thus demonstrating the sensitivity
of these higher-order correlation functions to large-scale structures in the
Universe. This analysis highlights that the SDSS datasets used here are not
``fair samples'' of the Universe for the estimation of higher-order clustering
statistics and larger volumes are required. We study the shape-dependence of
Q_z(s,q,theta) as one expects this measurement to depend on scale if the large
scale structure in the Universe has grown via gravitational instability from
Gaussian initial conditions. On small scales (s <= 6h^-1 Mpc), we see some
evidence for shape-dependence in Q_z, but at present our measurements are
consistent with a constant within the errors (Q_z ~ 0.75 +/- 0.05). On scales
>10h^-1 Mpc, we see considerable shape-dependence in Q_z.
-
I present here a review of past and present multi-disciplinary research of
the Pittsburgh Computational AstroStatistics (PiCA) group. This group is
dedicated to developing fast and efficient statistical algorithms for analysing
huge astronomical data sources. I begin with a short review of
multi-resolutional kd-trees which are the building blocks for many of our
algorithms. For example, quick range queries and fast n-point correlation
functions. I will present new results from the use of Mixture Models (Connolly
et al. 2000) in density estimation of multi-color data from the Sloan Digital
Sky Survey (SDSS). Specifically, the selection of quasars and the automated
identification of X-ray sources. I will also present a brief overview of the
False Discovery Rate (FDR) procedure (Miller et al. 2001a) and show how it has
been used in the detection of ``Baryon Wiggles'' in the local galaxy power
spectrum and source identification in radio data. Finally, I will look forward
to new research on an automated Bayes Network anomaly detector and the possible
use of the Locally Linear Embedding algorithm (LLE; Roweis & Saul 2000) for
spectral classification of SDSS spectra.
-
In this paper, we outline the use of Mixture Models in density estimation of
large astronomical databases. This method of density estimation has been known
in Statistics for some time but has not been implemented because of the large
computational cost. Herein, we detail an implementation of the Mixture Model
density estimation based on multi-resolutional KD-trees which makes this
statistical technique into a computationally tractable problem. We provide the
theoretical and experimental background for using a mixture model of Gaussians
based on the Expectation Maximization (EM) Algorithm. Applying these analyses
to simulated data sets we show that the EM algorithm - using the AIC penalized
likelihood to score the fit - out-performs the best kernel density estimate of
the distribution while requiring no ``fine--tuning'' of the input algorithm
parameters. We find that EM can accurately recover the underlying density
distribution from point processes thus providing an efficient adaptive
smoothing method for astronomical source catalogs. To demonstrate the general
application of this statistic to astrophysical problems we consider two cases
of density estimation: the clustering of galaxies in redshift space and the
clustering of stars in color space. From these data we show that EM provides an
adaptive smoothing of the distribution of galaxies in redshift space
(describing accurately both the small and large-scale features within the data)
and a means of identifying outliers in multi-dimensional color-color space
(e.g. for the identification of high redshift QSOs). Automated tools such as
those based on the EM algorithm will be needed in the analysis of the next
generation of astronomical catalogs (2MASS, FIRST, PLANCK, SDSS) and ultimately
in in the development of the National Virtual Observatory.
-
We present initial results on the use of Mixture Models for density
estimation in large astronomical databases. We provide herein both the
theoretical and experimental background for using a mixture model of Gaussians
based on the Expectation Maximization (EM) Algorithm. Applying these analyses
to simulated data sets we show that the EM algorithm - using the both the AIC &
BIC penalized likelihood to score the fit - can out-perform the best kernel
density estimate of the distribution while requiring no ``fine-tuning'' of the
input algorithm parameters. We find that EM can accurately recover the
underlying density distribution from point processes thus providing an
efficient adaptive smoothing method for astronomical source catalogs. To
demonstrate the general application of this statistic to astrophysical problems
we consider two cases of density estimation; the clustering of galaxies in
redshift space and the clustering of stars in color space. From these data we
show that EM provides an adaptive smoothing of the distribution of galaxies in
redshift space (describing accurately both the small and large-scale features
within the data) and a means of identifying outliers in multi-dimensional
color-color space (e.g. for the identification of high redshift QSOs).
Automated tools such as those based on the EM algorithm will be needed in the
analysis of the next generation of astronomical catalogs (2MASS, FIRST, PLANCK,
SDSS) and ultimately in the development of the National Virtual Observatory.
-
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.