• We use the "Dark Energy and Massive Neutrino Universe" (DEMNUni) simulations to compare the constraining power of "sufficient statistics" with the standard matter power spectrum on the sum of neutrino masses, $M_\nu \equiv \sum m_\nu$. In general, the power spectrum, even supplemented with higher moments of the distribution, captures only a fraction of the available cosmological information due to correlations between the Fourier modes. In contrast, the non-linear transform of sufficient statistics, approximated by a logarithmic mapping A=ln(1+\delta), was designed to capture all the available cosmological information contained in the matter clustering; in this sense it is an optimal observable. Our analysis takes advantage of the recent analytical model developed by Carron et al. 2014 to estimate both the matter power spectrum and the A-power spectrum covariance matrices. Using a Fisher information approach, we find that using sufficient statistics increases up to 8 times the available information on the total neutrino mass at z=0, thus tightening the constraints by almost a factor of 3 compared to the matter power spectrum.
  • Beyond the linear regime of structure formation, part of cosmological information encoded in galaxy clustering becomes inaccessible to the usual power spectrum. "Sufficient statistics", A*, were introduced recently to recapture the lost, and ultimately extract all, cosmological information. We present analytical approximations for the A* and traditional power spectra as well as for their covariance matrices in order to calculate analytically their cosmological information content in the context of Fisher information theory. Our approach allows the precise quantitative comparison of the techniques with each other and to the total information in the data, and provides insights into sufficient statistics. In particular, we find that while the A* power spectrum has a similar shape to the usual galaxy power spectrum, its amplitude is strongly modulated by small scale statistics. This effect is mostly responsible for the ability of the A* power spectrum to recapture the information lost for the usual power spectrum. We use our framework to forecast the best achievable cosmological constraints for projected surveys as a function of their galaxy density, and compare the information content of the two power spectra. We find that sufficient statistics extract all cosmological information, resulting in an approximately factor of ~2 gain for dense projected surveys at low redshift. This increase in the effective volume of projected surveys is consistent with previous numerical calculations.
  • Beyond the linear regime, Fourier modes of cosmological random fields become correlated, and the power spectrum of density fluctuations contains only a fraction of the available cosmological information. To unveil this formerly hidden information, the A* non-linear transform was introduced; it is optimized both for the nonlinearities induced by gravity and observational noise. Quantifying the resulting increase of our knowledge of cosmological parameters, we forecast the constraints from the angular power spectrum and that of A* from l ~ 200 to 3000 for upcoming galaxy surveys such as: the Wide-Field Infrared Survey Telescope (WFIRST), the Large Synoptic Survey Telescope (LSST), Euclid, the Hyper Suprime-Cam (HSC) and the Dark Energy Survey (DES). We find that at low redshifts this new data analysis strategy can double the extracted information, effectively doubling the survey area. To test the accuracy of our forecasting and the power of our data analysis methods, we apply the A* transformation to the latest release of the Canada-France-Hawaii-Telescope Legacy Survey (CFHTLS) Wide. While this data set is too sparse to allow for more than modest gains (~1.1-1.2), the realized gain from our method is in excellent agreement with our forecast, thus verifying the robustness of our analysis and prediction pipelines.
  • [Abridged] We use mock catalogues based on the GALICS model (Hatton et al. 03) to explore the nature of galaxy clustering observed in the SDSS. We measure low and high order angular clustering statistic from these mock catalogues, after selecting galaxies the same way as for observations, and compare them directly to estimates from SDSS data. Note that we also present measurements of S3-S5 on the SDSS DR1. We find that our model is in general good agreement with observations in the scale/luminosity range where we can trust the predictions. This range is found to be limited (i) by the size of the dark matter simulation used -- which introduces finite volume effects at large scales -- and by the mass resolution of this simulation -- which introduces incompleteness at apparent magnitudes fainter than $r\sim 20$. We then focus on the small scale clustering properties of galaxies and investigate the behaviour of three different prescriptions for positioning galaxies within haloes of dark matter. We show that galaxies are poor tracers both of DM particles or DM sub-structures, within groups and clusters. Instead, SDSS data tells us that the distribution of galaxies lies somewhat in between these two populations. This confirms the general theoretical expectation from numerical simulations and semi-analytic modelling.
  • We present measurements of the normalised redshift-space three-point correlation function (Q_z) of galaxies from the Sloan Digital Sky Survey (SDSS) main galaxy sample. We have applied our "npt" algorithm to both a volume-limited (36738 galaxies) and magnitude-limited sample (134741 galaxies) of SDSS galaxies, and find consistent results between the two samples, thus confirming the weak luminosity dependence of Q_z recently seen by other authors. We compare our results to other Q_z measurements in the literature and find it to be consistent within the full jack-knife error estimates. However, we find these errors are significantly increased by the presence of the ``Sloan Great Wall'' (at z ~ 0.08) within these two SDSS datasets, which changes the 3-point correlation function (3PCF) by 70% on large scales (s>=10h^-1 Mpc). If we exclude this supercluster, our observed Q_z is in better agreement with that obtained from the 2dFGRS by other authors, thus demonstrating the sensitivity of these higher-order correlation functions to large-scale structures in the Universe. This analysis highlights that the SDSS datasets used here are not ``fair samples'' of the Universe for the estimation of higher-order clustering statistics and larger volumes are required. We study the shape-dependence of Q_z(s,q,theta) as one expects this measurement to depend on scale if the large scale structure in the Universe has grown via gravitational instability from Gaussian initial conditions. On small scales (s <= 6h^-1 Mpc), we see some evidence for shape-dependence in Q_z, but at present our measurements are consistent with a constant within the errors (Q_z ~ 0.75 +/- 0.05). On scales >10h^-1 Mpc, we see considerable shape-dependence in Q_z.
  • We study the luminosity and color dependence of the galaxy 2-point correlation function in the Sloan Digital Sky Survey, starting from a sample of 200,000 galaxies over 2500 deg^2. We concentrate on the projected correlation function w(r_p), which is directly related to the real space \xi(r). The amplitude of w(r_p) grows continuously with luminosity, rising more steeply above the characteristic luminosity L_*. Redder galaxies exhibit a higher amplitude and steeper correlation function at all luminosities. The correlation amplitude of blue galaxies increases continuously with luminosity, but the luminosity dependence for red galaxies is less regular, with bright red galaxies more strongly clustered at large scales and faint red galaxies more strongly clustered at small scales. We interpret these results using halo occupation distribution (HOD) models assuming concordance cosmological parameters. For most samples, an HOD model with two adjustable parameters fits the w(r_p) data better than a power-law, explaining inflections at r_p ~ 1-3 Mpc/h as the transition between the 1-halo and 2-halo regimes of \xi(r). The implied minimum mass for a halo hosting a central galaxy above a luminosity threshold L grows as M_min ~ L at low luminosities and more steeply above L_*. The mass at which an average halo has one satellite galaxy brighter than L is M_1 ~ 23 M_min(L). These results imply a conditional luminosity function (at fixed halo mass) in which central galaxies lie far above a Schechter function extrapolation of the satellite population. HOD models nicely explain the joint luminosity-color dependence of w(r_p) in terms of the color fractions of central and satellite populations as a function of halo mass. The inferred HOD properties are in good qualitative agreement with theoretical predictions.
  • We present the large-scale correlation function measured from a spectroscopic sample of 46,748 luminous red galaxies from the Sloan Digital Sky Survey. The survey region covers 0.72 h^{-3} Gpc^3 over 3816 square degrees and 0.16<z<0.47, making it the best sample yet for the study of large-scale structure. We find a well-detected peak in the correlation function at 100h^{-1} Mpc separation that is an excellent match to the predicted shape and location of the imprint of the recombination-epoch acoustic oscillations on the low-redshift clustering of matter. This detection demonstrates the linear growth of structure by gravitational instability between z=1000 and the present and confirms a firm prediction of the standard cosmological theory. The acoustic peak provides a standard ruler by which we can measure the ratio of the distances to z=0.35 and z=1089 to 4% fractional accuracy and the absolute distance to z=0.35 to 5% accuracy. From the overall shape of the correlation function, we measure the matter density Omega_mh^2 to 8% and find agreement with the value from cosmic microwave background (CMB) anisotropies. Independent of the constraints provided by the CMB acoustic scale, we find Omega_m = 0.273 +- 0.025 + 0.123 (1+w_0) + 0.137 Omega_K. Including the CMB acoustic scale, we find that the spatial curvature is Omega_K=-0.010+-0.009 if the dark energy is a cosmological constant. More generally, our results provide a measurement of cosmological distance, and hence an argument for dark energy, based on a geometric method with the same simple physics as the microwave background anisotropies. The standard cosmological model convincingly passes these new and robust tests of its fundamental properties.
  • We measure cosmological parameters using the three-dimensional power spectrum P(k) from over 200,000 galaxies in the Sloan Digital Sky Survey (SDSS) in combination with WMAP and other data. Our results are consistent with a ``vanilla'' flat adiabatic Lambda-CDM model without tilt (n=1), running tilt, tensor modes or massive neutrinos. Adding SDSS information more than halves the WMAP-only error bars on some parameters, tightening 1 sigma constraints on the Hubble parameter from h~0.74+0.18-0.07 to h~0.70+0.04-0.03, on the matter density from Omega_m~0.25+/-0.10 to Omega_m~0.30+/-0.04 (1 sigma) and on neutrino masses from <11 eV to <0.6 eV (95%). SDSS helps even more when dropping prior assumptions about curvature, neutrinos, tensor modes and the equation of state. Our results are in substantial agreement with the joint analysis of WMAP and the 2dF Galaxy Redshift Survey, which is an impressive consistency check with independent redshift survey data and analysis techniques. In this paper, we place particular emphasis on clarifying the physical origin of the constraints, i.e., what we do and do not know when using different data sets and prior assumptions. For instance, dropping the assumption that space is perfectly flat, the WMAP-only constraint on the measured age of the Universe tightens from t0~16.3+2.3-1.8 Gyr to t0~14.1+1.0-0.9 Gyr by adding SDSS and SN Ia data. Including tensors, running tilt, neutrino mass and equation of state in the list of free parameters, many constraints are still quite weak, but future cosmological measurements from SDSS and other sources should allow these to be substantially tightened.
  • I present here a review of past and present multi-disciplinary research of the Pittsburgh Computational AstroStatistics (PiCA) group. This group is dedicated to developing fast and efficient statistical algorithms for analysing huge astronomical data sources. I begin with a short review of multi-resolutional kd-trees which are the building blocks for many of our algorithms. For example, quick range queries and fast n-point correlation functions. I will present new results from the use of Mixture Models (Connolly et al. 2000) in density estimation of multi-color data from the Sloan Digital Sky Survey (SDSS). Specifically, the selection of quasars and the automated identification of X-ray sources. I will also present a brief overview of the False Discovery Rate (FDR) procedure (Miller et al. 2001a) and show how it has been used in the detection of ``Baryon Wiggles'' in the local galaxy power spectrum and source identification in radio data. Finally, I will look forward to new research on an automated Bayes Network anomaly detector and the possible use of the Locally Linear Embedding algorithm (LLE; Roweis & Saul 2000) for spectral classification of SDSS spectra.
  • Szapudi et al (2001) introduced the method of estimating angular power spectrum of the CMB sky via heuristically weighted correlation functions. Part of the new technique is that all (co)variances are evaluated by massive Monte Carlo simulations, therefore a fast way to measure correlation functions in a high resolution map is essential. This letter presents a new algorithm to calculate pixel space correlation functions via fast spherical harmonics transforms. Our present implementation of the idea extracts correlations from a MAP-like CMB map (HEALPix resolution of 512, i.e. $ \simeq 3 \times 10^6$ pixels) in about 5 minutes on a 500MHz computer, including $C_\ell$ inversion; the analysis of one Planck-like map takes less then one hour. We use heuristic window and noise weighting in pixel space, and include the possibility of additional signal weighting as well, either in $\ell$ or pixel space. We apply the new code to an ensemble of MAP simulations, to test the response of our method to the inhomogenous sky coverage/noise of MAP. We show that the resulting $C_\ell$'s are very close to the theoretical expectations. The HEALPix based implementation of the method, SpICE (Spatially Inhomogenous Correlation Estimator) will be available to the public from the authors.
  • We outline here the next generation of cluster-finding algorithms. We show how advances in Computer Science and Statistics have helped develop robust, fast algorithms for finding clusters of galaxies in large multi-dimensional astronomical databases like the Sloan Digital Sky Survey (SDSS). Specifically, this paper presents four new advances: (1) A new semi-parametric algorithm - nicknamed ``C4'' - for jointly finding clusters of galaxies in the SDSS and ROSAT All-Sky Survey databases; (2) The introduction of the False Discovery Rate into Astronomy; (3) The role of kernel shape in optimizing cluster detection; (4) A new determination of the X-ray Cluster Luminosity Function which has bearing on the existence of a ``deficit'' of high redshift, high luminosity clusters. This research is part of our ``Computational AstroStatistics'' collaboration (see Nichol et al. 2000) and the algorithms and techniques discussed herein will form part of the ``Virtual Observatory'' analysis toolkit.
  • The fully general calculation of the cosmic error on N-point correlation functions and related quantities is presented. More precisely, the variance caused by the finite volume, discreteness, and edge effects is determined for {\em any} estimator which is based on a general function of N-tuples, such as multi-point correlation functions and multi-spectra. The results are printed explicitly for the two-point correlation function (or power-spectrum), and for the three-point correlation (or bispectrum). These are the most popular statistics in the study of large scale structure, yet, the a general calculation of their variance has not been performed until now.
  • The errors on statistics measured in finite galaxy catalogs are exhaustively investigated. The theory of errors on factorial moments by Szapudi & Colombi (1996) is applied to cumulants via a series expansion method. All results are subsequently extended to the weakly non-linear regime. Together with previous investigations this yields an analytic theory of the errors for moments and connected moments of counts in cells from highly nonlinear to weakly nonlinear scales. The final analytic formulae representing the full theory are explicit but somewhat complicated. Therefore as a companion to this paper we supply a FORTRAN program capable of calculating the described quantities numerically (abridged).
  • For a given statistic, A, the cosmic distribution function, Upsilon(VA), is the probability of measuring a value VA in a finite galaxy catalog. For statistics related to count-in-cells, such as factorial moments, F_k, the average correlation function, xiav, and cumulants, S_N, the functions Upsilon(VF_k), Upsilon(Vxiav), and Upsilon(VS_N) were measured in a large tauCDM simulation. This N-body experiment simulates almost the full ``Hubble Volume'' of the universe, thus, for the first time, it allowed for an accurate analysis of the cosmic distribution function, and, in particular, of its variance (Delta A)^2, the cosmic error. The resulting detailed knowledge about the shape of Upsilon is crucial for likelihood analyses. The measured cosmic error agrees remarkably well with the theoretical predictions of Szapudi & Colombi (1996) and Szapudi, Bernardeau & Colombi (1998) in the weakly non-linear regime, while the predictions are slightly above the measurements in the highly nonlinear regime. When the relative cosmic error is small, (Delta A/A)^2, function Upsilon is nearly Gaussian. When (Delta A/A)^2 approaches unity or is larger, function Upsilon(VA) is increasingly skewed and well approximated by a lognormal distribution for A=F_k, or A=xiav. The measured cumulants follow accurately the perturbation theory predictions in the weakly nonlinear regime. Extended perturbation theory is an excellent approximation for all the available dynamic range.
  • The effects of sampling are investigated on measurements of counts-in-cells in three-dimensional magnitude limited galaxy surveys, with emphasis on moments of the underlying smooth galaxy density field convolved with a spherical window. A new estimator is proposed for measuring the k-th order moment < rho^k >: the weighted factorial moment F_k[w], corrected for the effects of the varying selection function. The cosmic error on the measurement of F_k[w] is computed via the the formalism of Szapudi & Colombi (1996), which is generalized to include selection effects. The integral equation for finding the minimum variance weight is solved numerically, and an intuitive analytical approximation is derived. The resulting estimator is more accurate than the traditional method of counts-in-cells in volume limited samples, which discards useful information. As a practical example we consider the case of the future Sloan Digital Sky Survey. Optimal (sparse) sampling strategies for designing magnitude limited redshift surveys are investigated as well. It is found that the optimal strategy depends greatly on the statistics and scales considered. Finally we consider the issue of designing the geometry of a catalog, when it covers only a small fraction of the sky.