
We use the "Dark Energy and Massive Neutrino Universe" (DEMNUni) simulations
to compare the constraining power of "sufficient statistics" with the standard
matter power spectrum on the sum of neutrino masses, $M_\nu \equiv \sum m_\nu$.
In general, the power spectrum, even supplemented with higher moments of the
distribution, captures only a fraction of the available cosmological
information due to correlations between the Fourier modes. In contrast, the
nonlinear transform of sufficient statistics, approximated by a logarithmic
mapping A=ln(1+\delta), was designed to capture all the available cosmological
information contained in the matter clustering; in this sense it is an optimal
observable. Our analysis takes advantage of the recent analytical model
developed by Carron et al. 2014 to estimate both the matter power spectrum and
the Apower spectrum covariance matrices. Using a Fisher information approach,
we find that using sufficient statistics increases up to 8 times the available
information on the total neutrino mass at z=0, thus tightening the constraints
by almost a factor of 3 compared to the matter power spectrum.

Beyond the linear regime of structure formation, part of cosmological
information encoded in galaxy clustering becomes inaccessible to the usual
power spectrum. "Sufficient statistics", A*, were introduced recently to
recapture the lost, and ultimately extract all, cosmological information. We
present analytical approximations for the A* and traditional power spectra as
well as for their covariance matrices in order to calculate analytically their
cosmological information content in the context of Fisher information theory.
Our approach allows the precise quantitative comparison of the techniques with
each other and to the total information in the data, and provides insights into
sufficient statistics. In particular, we find that while the A* power spectrum
has a similar shape to the usual galaxy power spectrum, its amplitude is
strongly modulated by small scale statistics. This effect is mostly responsible
for the ability of the A* power spectrum to recapture the information lost for
the usual power spectrum. We use our framework to forecast the best achievable
cosmological constraints for projected surveys as a function of their galaxy
density, and compare the information content of the two power spectra. We find
that sufficient statistics extract all cosmological information, resulting in
an approximately factor of ~2 gain for dense projected surveys at low redshift.
This increase in the effective volume of projected surveys is consistent with
previous numerical calculations.

Beyond the linear regime, Fourier modes of cosmological random fields become
correlated, and the power spectrum of density fluctuations contains only a
fraction of the available cosmological information. To unveil this formerly
hidden information, the A* nonlinear transform was introduced; it is optimized
both for the nonlinearities induced by gravity and observational noise.
Quantifying the resulting increase of our knowledge of cosmological parameters,
we forecast the constraints from the angular power spectrum and that of A* from
l ~ 200 to 3000 for upcoming galaxy surveys such as: the WideField Infrared
Survey Telescope (WFIRST), the Large Synoptic Survey Telescope (LSST), Euclid,
the Hyper SuprimeCam (HSC) and the Dark Energy Survey (DES). We find that at
low redshifts this new data analysis strategy can double the extracted
information, effectively doubling the survey area. To test the accuracy of our
forecasting and the power of our data analysis methods, we apply the A*
transformation to the latest release of the CanadaFranceHawaiiTelescope
Legacy Survey (CFHTLS) Wide. While this data set is too sparse to allow for
more than modest gains (~1.11.2), the realized gain from our method is in
excellent agreement with our forecast, thus verifying the robustness of our
analysis and prediction pipelines.

[Abridged] We use mock catalogues based on the GALICS model (Hatton et al.
03) to explore the nature of galaxy clustering observed in the SDSS. We measure
low and high order angular clustering statistic from these mock catalogues,
after selecting galaxies the same way as for observations, and compare them
directly to estimates from SDSS data. Note that we also present measurements of
S3S5 on the SDSS DR1. We find that our model is in general good agreement with
observations in the scale/luminosity range where we can trust the predictions.
This range is found to be limited (i) by the size of the dark matter simulation
used  which introduces finite volume effects at large scales  and by the
mass resolution of this simulation  which introduces incompleteness at
apparent magnitudes fainter than $r\sim 20$. We then focus on the small scale
clustering properties of galaxies and investigate the behaviour of three
different prescriptions for positioning galaxies within haloes of dark matter.
We show that galaxies are poor tracers both of DM particles or DM
substructures, within groups and clusters. Instead, SDSS data tells us that
the distribution of galaxies lies somewhat in between these two populations.
This confirms the general theoretical expectation from numerical simulations
and semianalytic modelling.

We present measurements of the normalised redshiftspace threepoint
correlation function (Q_z) of galaxies from the Sloan Digital Sky Survey (SDSS)
main galaxy sample. We have applied our "npt" algorithm to both a
volumelimited (36738 galaxies) and magnitudelimited sample (134741 galaxies)
of SDSS galaxies, and find consistent results between the two samples, thus
confirming the weak luminosity dependence of Q_z recently seen by other
authors. We compare our results to other Q_z measurements in the literature and
find it to be consistent within the full jackknife error estimates. However,
we find these errors are significantly increased by the presence of the ``Sloan
Great Wall'' (at z ~ 0.08) within these two SDSS datasets, which changes the
3point correlation function (3PCF) by 70% on large scales (s>=10h^1 Mpc). If
we exclude this supercluster, our observed Q_z is in better agreement with that
obtained from the 2dFGRS by other authors, thus demonstrating the sensitivity
of these higherorder correlation functions to largescale structures in the
Universe. This analysis highlights that the SDSS datasets used here are not
``fair samples'' of the Universe for the estimation of higherorder clustering
statistics and larger volumes are required. We study the shapedependence of
Q_z(s,q,theta) as one expects this measurement to depend on scale if the large
scale structure in the Universe has grown via gravitational instability from
Gaussian initial conditions. On small scales (s <= 6h^1 Mpc), we see some
evidence for shapedependence in Q_z, but at present our measurements are
consistent with a constant within the errors (Q_z ~ 0.75 +/ 0.05). On scales
>10h^1 Mpc, we see considerable shapedependence in Q_z.

We study the luminosity and color dependence of the galaxy 2point
correlation function in the Sloan Digital Sky Survey, starting from a sample of
200,000 galaxies over 2500 deg^2. We concentrate on the projected correlation
function w(r_p), which is directly related to the real space \xi(r). The
amplitude of w(r_p) grows continuously with luminosity, rising more steeply
above the characteristic luminosity L_*. Redder galaxies exhibit a higher
amplitude and steeper correlation function at all luminosities. The correlation
amplitude of blue galaxies increases continuously with luminosity, but the
luminosity dependence for red galaxies is less regular, with bright red
galaxies more strongly clustered at large scales and faint red galaxies more
strongly clustered at small scales. We interpret these results using halo
occupation distribution (HOD) models assuming concordance cosmological
parameters. For most samples, an HOD model with two adjustable parameters fits
the w(r_p) data better than a powerlaw, explaining inflections at r_p ~ 13
Mpc/h as the transition between the 1halo and 2halo regimes of \xi(r). The
implied minimum mass for a halo hosting a central galaxy above a luminosity
threshold L grows as M_min ~ L at low luminosities and more steeply above L_*.
The mass at which an average halo has one satellite galaxy brighter than L is
M_1 ~ 23 M_min(L). These results imply a conditional luminosity function (at
fixed halo mass) in which central galaxies lie far above a Schechter function
extrapolation of the satellite population. HOD models nicely explain the joint
luminositycolor dependence of w(r_p) in terms of the color fractions of
central and satellite populations as a function of halo mass. The inferred HOD
properties are in good qualitative agreement with theoretical predictions.

We present the largescale correlation function measured from a spectroscopic
sample of 46,748 luminous red galaxies from the Sloan Digital Sky Survey. The
survey region covers 0.72 h^{3} Gpc^3 over 3816 square degrees and
0.16<z<0.47, making it the best sample yet for the study of largescale
structure. We find a welldetected peak in the correlation function at
100h^{1} Mpc separation that is an excellent match to the predicted shape and
location of the imprint of the recombinationepoch acoustic oscillations on the
lowredshift clustering of matter. This detection demonstrates the linear
growth of structure by gravitational instability between z=1000 and the present
and confirms a firm prediction of the standard cosmological theory. The
acoustic peak provides a standard ruler by which we can measure the ratio of
the distances to z=0.35 and z=1089 to 4% fractional accuracy and the absolute
distance to z=0.35 to 5% accuracy. From the overall shape of the correlation
function, we measure the matter density Omega_mh^2 to 8% and find agreement
with the value from cosmic microwave background (CMB) anisotropies. Independent
of the constraints provided by the CMB acoustic scale, we find Omega_m = 0.273
+ 0.025 + 0.123 (1+w_0) + 0.137 Omega_K. Including the CMB acoustic scale, we
find that the spatial curvature is Omega_K=0.010+0.009 if the dark energy is
a cosmological constant. More generally, our results provide a measurement of
cosmological distance, and hence an argument for dark energy, based on a
geometric method with the same simple physics as the microwave background
anisotropies. The standard cosmological model convincingly passes these new and
robust tests of its fundamental properties.

We measure cosmological parameters using the threedimensional power spectrum
P(k) from over 200,000 galaxies in the Sloan Digital Sky Survey (SDSS) in
combination with WMAP and other data. Our results are consistent with a
``vanilla'' flat adiabatic LambdaCDM model without tilt (n=1), running tilt,
tensor modes or massive neutrinos. Adding SDSS information more than halves the
WMAPonly error bars on some parameters, tightening 1 sigma constraints on the
Hubble parameter from h~0.74+0.180.07 to h~0.70+0.040.03, on the matter
density from Omega_m~0.25+/0.10 to Omega_m~0.30+/0.04 (1 sigma) and on
neutrino masses from <11 eV to <0.6 eV (95%). SDSS helps even more when
dropping prior assumptions about curvature, neutrinos, tensor modes and the
equation of state. Our results are in substantial agreement with the joint
analysis of WMAP and the 2dF Galaxy Redshift Survey, which is an impressive
consistency check with independent redshift survey data and analysis
techniques. In this paper, we place particular emphasis on clarifying the
physical origin of the constraints, i.e., what we do and do not know when using
different data sets and prior assumptions. For instance, dropping the
assumption that space is perfectly flat, the WMAPonly constraint on the
measured age of the Universe tightens from t0~16.3+2.31.8 Gyr to
t0~14.1+1.00.9 Gyr by adding SDSS and SN Ia data. Including tensors, running
tilt, neutrino mass and equation of state in the list of free parameters, many
constraints are still quite weak, but future cosmological measurements from
SDSS and other sources should allow these to be substantially tightened.

I present here a review of past and present multidisciplinary research of
the Pittsburgh Computational AstroStatistics (PiCA) group. This group is
dedicated to developing fast and efficient statistical algorithms for analysing
huge astronomical data sources. I begin with a short review of
multiresolutional kdtrees which are the building blocks for many of our
algorithms. For example, quick range queries and fast npoint correlation
functions. I will present new results from the use of Mixture Models (Connolly
et al. 2000) in density estimation of multicolor data from the Sloan Digital
Sky Survey (SDSS). Specifically, the selection of quasars and the automated
identification of Xray sources. I will also present a brief overview of the
False Discovery Rate (FDR) procedure (Miller et al. 2001a) and show how it has
been used in the detection of ``Baryon Wiggles'' in the local galaxy power
spectrum and source identification in radio data. Finally, I will look forward
to new research on an automated Bayes Network anomaly detector and the possible
use of the Locally Linear Embedding algorithm (LLE; Roweis & Saul 2000) for
spectral classification of SDSS spectra.

Szapudi et al (2001) introduced the method of estimating angular power
spectrum of the CMB sky via heuristically weighted correlation functions. Part
of the new technique is that all (co)variances are evaluated by massive Monte
Carlo simulations, therefore a fast way to measure correlation functions in a
high resolution map is essential. This letter presents a new algorithm to
calculate pixel space correlation functions via fast spherical harmonics
transforms. Our present implementation of the idea extracts correlations from a
MAPlike CMB map (HEALPix resolution of 512, i.e. $ \simeq 3 \times 10^6$
pixels) in about 5 minutes on a 500MHz computer, including $C_\ell$ inversion;
the analysis of one Plancklike map takes less then one hour. We use heuristic
window and noise weighting in pixel space, and include the possibility of
additional signal weighting as well, either in $\ell$ or pixel space. We apply
the new code to an ensemble of MAP simulations, to test the response of our
method to the inhomogenous sky coverage/noise of MAP. We show that the
resulting $C_\ell$'s are very close to the theoretical expectations. The
HEALPix based implementation of the method, SpICE (Spatially Inhomogenous
Correlation Estimator) will be available to the public from the authors.

We outline here the next generation of clusterfinding algorithms. We show
how advances in Computer Science and Statistics have helped develop robust,
fast algorithms for finding clusters of galaxies in large multidimensional
astronomical databases like the Sloan Digital Sky Survey (SDSS). Specifically,
this paper presents four new advances: (1) A new semiparametric algorithm 
nicknamed ``C4''  for jointly finding clusters of galaxies in the SDSS and
ROSAT AllSky Survey databases; (2) The introduction of the False Discovery
Rate into Astronomy; (3) The role of kernel shape in optimizing cluster
detection; (4) A new determination of the Xray Cluster Luminosity Function
which has bearing on the existence of a ``deficit'' of high redshift, high
luminosity clusters. This research is part of our ``Computational
AstroStatistics'' collaboration (see Nichol et al. 2000) and the algorithms and
techniques discussed herein will form part of the ``Virtual Observatory''
analysis toolkit.

The fully general calculation of the cosmic error on Npoint correlation
functions and related quantities is presented. More precisely, the variance
caused by the finite volume, discreteness, and edge effects is determined for
{\em any} estimator which is based on a general function of Ntuples, such as
multipoint correlation functions and multispectra. The results are printed
explicitly for the twopoint correlation function (or powerspectrum), and for
the threepoint correlation (or bispectrum). These are the most popular
statistics in the study of large scale structure, yet, the a general
calculation of their variance has not been performed until now.

The errors on statistics measured in finite galaxy catalogs are exhaustively
investigated. The theory of errors on factorial moments by Szapudi & Colombi
(1996) is applied to cumulants via a series expansion method. All results are
subsequently extended to the weakly nonlinear regime. Together with previous
investigations this yields an analytic theory of the errors for moments and
connected moments of counts in cells from highly nonlinear to weakly nonlinear
scales. The final analytic formulae representing the full theory are explicit
but somewhat complicated. Therefore as a companion to this paper we supply a
FORTRAN program capable of calculating the described quantities numerically
(abridged).

For a given statistic, A, the cosmic distribution function, Upsilon(VA), is
the probability of measuring a value VA in a finite galaxy catalog. For
statistics related to countincells, such as factorial moments, F_k, the
average correlation function, xiav, and cumulants, S_N, the functions
Upsilon(VF_k), Upsilon(Vxiav), and Upsilon(VS_N) were measured in a large
tauCDM simulation. This Nbody experiment simulates almost the full ``Hubble
Volume'' of the universe, thus, for the first time, it allowed for an accurate
analysis of the cosmic distribution function, and, in particular, of its
variance (Delta A)^2, the cosmic error. The resulting detailed knowledge about
the shape of Upsilon is crucial for likelihood analyses. The measured cosmic
error agrees remarkably well with the theoretical predictions of Szapudi &
Colombi (1996) and Szapudi, Bernardeau & Colombi (1998) in the weakly
nonlinear regime, while the predictions are slightly above the measurements in
the highly nonlinear regime. When the relative cosmic error is small, (Delta
A/A)^2, function Upsilon is nearly Gaussian. When (Delta A/A)^2 approaches
unity or is larger, function Upsilon(VA) is increasingly skewed and well
approximated by a lognormal distribution for A=F_k, or A=xiav. The measured
cumulants follow accurately the perturbation theory predictions in the weakly
nonlinear regime. Extended perturbation theory is an excellent approximation
for all the available dynamic range.

The effects of sampling are investigated on measurements of countsincells
in threedimensional magnitude limited galaxy surveys, with emphasis on moments
of the underlying smooth galaxy density field convolved with a spherical
window. A new estimator is proposed for measuring the kth order moment < rho^k
>: the weighted factorial moment F_k[w], corrected for the effects of the
varying selection function.
The cosmic error on the measurement of F_k[w] is computed via the the
formalism of Szapudi & Colombi (1996), which is generalized to include
selection effects. The integral equation for finding the minimum variance
weight is solved numerically, and an intuitive analytical approximation is
derived. The resulting estimator is more accurate than the traditional method
of countsincells in volume limited samples, which discards useful
information. As a practical example we consider the case of the future Sloan
Digital Sky Survey.
Optimal (sparse) sampling strategies for designing magnitude limited redshift
surveys are investigated as well. It is found that the optimal strategy depends
greatly on the statistics and scales considered.
Finally we consider the issue of designing the geometry of a catalog, when it
covers only a small fraction of the sky.