• ### Star-galaxy Classification Using Deep Convolutional Neural Networks(1608.04369)

Most existing star-galaxy classifiers use the reduced summary information from catalogs, requiring careful feature extraction and selection. The latest advances in machine learning that use deep convolutional neural networks allow a machine to automatically learn the features directly from data, minimizing the need for input from human experts. We present a star-galaxy classification framework that uses deep convolutional neural networks (ConvNets) directly on the reduced, calibrated pixel values. Using data from the Sloan Digital Sky Survey (SDSS) and the Canada-France-Hawaii Telescope Lensing Survey (CFHTLenS), we demonstrate that ConvNets are able to produce accurate and well-calibrated probabilistic classifications that are competitive with conventional machine learning techniques. Future advances in deep learning may bring more success with current and forthcoming photometric surveys, such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope (LSST), because deep neural networks require very little, manual feature engineering.
• ### Teaching Data Science(1604.07397)

April 25, 2016 physics.ed-ph, cs.CY, stat.OT
We describe an introductory data science course, entitled Introduction to Data Science, offered at the University of Illinois at Urbana-Champaign. The course introduced general programming concepts by using the Python programming language with an emphasis on data preparation, processing, and presentation. The course had no prerequisites, and students were not expected to have any programming experience. This introductory course was designed to cover a wide range of topics, from the nature of data, to storage, to visualization, to probability and statistical analysis, to cloud and high performance computing, without becoming overly focused on any one subject. We conclude this article with a discussion of lessons learned and our plans to develop new data science courses.
• ### Machine Learning and Cosmological Simulations II: Hydrodynamical Simulations(1510.07659)

Jan. 12, 2016 astro-ph.CO, astro-ph.GA
We extend a machine learning (ML) framework presented previously to model galaxy formation and evolution in a hierarchical universe using N-body + hydrodynamical simulations. In this work, we show that ML is a promising technique to study galaxy formation in the backdrop of a hydrodynamical simulation. We use the Illustris Simulation to train and test various sophisticated machine learning algorithms. By using only essential dark matter halo physical properties and no merger history, our model predicts the gas mass, stellar mass, black hole mass, star formation rate, $g-r$ color, and stellar metallicity fairly robustly. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon a solid hydrodynamical simulation. The promising reproduction of the listed galaxy properties demonstrably place ML as a promising and a significantly more computationally efficient tool to study small-scale structure formation. We find that ML mimics a full-blown hydrodynamical simulation surprisingly well in a computation time of mere minutes. The population of galaxies simulated by ML, while not numerically identical to Illustris, is statistically and physically robust and follows the same fundamental observational constraints. Machine learning offers an intriguing and promising technique to create quick mock galaxy catalogs in the future.
• ### Creating updated, scientifically-calibrated mosaic images for the RC3 catalogue(1512.01204)

Dec. 3, 2015 astro-ph.GA, astro-ph.IM
The Third Reference Catalogue of Bright Galaxies (RC3) is a reasonably complete listing of 23,011 nearby, large, bright galaxies. By using the final imaging data release from the Sloan Digital Sky Survey, we generate scientifically-calibrated FITS mosaics by using the montage program for all SDSS imaging bands for all RC3 galaxies that lie within the survey footprint. We further combine the SDSS g, r, and i band FITS mosaics for these galaxies to create color-composite images by using the STIFF program. We generalized this software framework to make FITS mosaics and color-composite images for an arbitrary catalog and imaging data set. Due to positional inaccuracies inherent in the RC3 catalog, we employ a recursive algorithm in our mosaicking pipeline that first determines the correct location for each galaxy, and subsequently applies the mosaicking procedure. As an additional test of this new software pipeline and to obtain mosaic images of a larger sample of RC3 galaxies, we also applied this pipeline to photographic data taken by the Second Palomar Observatory Sky Survey with $B_J$, $R_F$, and $I_N$ plates. We publicly release all generated data, accessible via a web search form, and the software pipeline to enable others to make galaxy mosaics by using other catalogs or surveys.
• ### Machine Learning and Cosmological Simulations I: Semi-Analytical Models(1510.06402)

Oct. 21, 2015 astro-ph.CO, astro-ph.GA
We present a new exploratory framework to model galaxy formation and evolution in a hierarchical universe by using machine learning (ML). Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively analyzing the extent of the influence of dark matter halo properties on galaxies in the backdrop of semi-analytical models (SAMs). We use the influential Millennium Simulation and the corresponding Munich SAM to train and test various sophisticated machine learning algorithms (k-Nearest Neighbors, decision trees, random forests and extremely randomized trees). By using only essential dark matter halo physical properties for haloes of $M>10^{12} M_{\odot}$ and a partial merger tree, our model predicts the hot gas mass, cold gas mass, bulge mass, total stellar mass, black hole mass and cooling radius at z = 0 for each central galaxy in a dark matter halo for the Millennium run. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon SAMs and demonstrably place ML as a promising and a computationally efficient tool to study small-scale structure formation.
• ### A Hybrid Ensemble Learning Approach to Star-Galaxy Classification(1505.02200)

July 14, 2015 astro-ph.IM
There exist a variety of star-galaxy classification techniques, each with their own strengths and weaknesses. In this paper, we present a novel meta-classification framework that combines and fully exploits different techniques to produce a more robust star-galaxy classification. To demonstrate this hybrid, ensemble approach, we combine a purely morphological classifier, a supervised machine learning method based on random forest, an unsupervised machine learning method based on self-organizing maps, and a hierarchical Bayesian template fitting method. Using data from the CFHTLenS survey, we consider different scenarios: when a high-quality training set is available with spectroscopic labels from DEEP2, SDSS, VIPERS, and VVDS, and when the demographics of sources in a low-quality training set do not match the demographics of objects in the test data set. We demonstrate that our Bayesian combination technique improves the overall performance over any individual classification method in these scenarios. Thus, strategies that combine the predictions of different classifiers may prove to be optimal in currently ongoing and forthcoming photometric surveys, such as the Dark Energy Survey and the Large Synoptic Survey Telescope.
• ### Narrow absorption line variability in repeat quasar observations from the Sloan Digital Sky Survey(1307.7832)

July 30, 2013 astro-ph.CO
We present the results from a time domain study of absorption lines detected in quasar spectra with repeat observations from the Sloan Digital Sky Survey Data Release 7 (SDSS DR7). Beginning with over 4500 unique time separation baselines of various absorption line species identified in the SDSS DR7 quasar spectra, we create a catalogue of 2522 quasar absorption line systems with two to eight repeat observations, representing the largest collection of unbiased and homogeneous multi-epoch absorption systems ever published. To investigate these systems for time variability of narrow absorption lines, we refine this sample based on the reliability of the system detection, the proximity of pixels with bright sky contamination to individual absorption lines, and the quality of the continuum fit. Variability measurements of this sub-sample based on the absorption line equivalent widths yield a total of 33 systems with indications of significantly variable absorption strengths on time-scales ranging from one day to several years in the rest frame of the absorption system. Of these, at least 10 are from a class known as intervening absorption systems caused by foreground galaxies along the line of sight to the background quasar. This is the first evidence of possible absorption line variability detected in intervening systems, and their short time-scale variations suggest that small-scale structures (~10-100 au) are likely to exist in their host foreground galaxies.
• ### Evolution of the Clustering of Photometrically Selected SDSS Galaxies(1002.1476)

April 26, 2010 astro-ph.CO
We measure the angular auto-correlation functions (w) of SDSS galaxies selected to have photometric redshifts 0.1 < z < 0.4 and absolute r-band magnitudes Mr < -21.2. We split these galaxies into five overlapping redshift shells of width 0.1 and measure w in each subsample in order to investigate the evolution of SDSS galaxies. We find that the bias increases substantially with redshift - much more so than one would expect for a passively evolving sample. We use halo-model analysis to determine the best-fit halo-occupation-distribution (HOD) for each subsample, and the best-fit models allow us to interpret the change in bias physically. In order to properly interpret our best-fit HODs, we convert each halo mass to its z = 0 passively evolved bias (bo), enabling a direct comparison of the best-fit HODs at different redshifts. We find that the minimum halo bo required to host a galaxy decreases as the redshift decreases, suggesting that galaxies with Mr < -21.2 are forming in halos at the low-mass end of the HODs over our redshift range. We use the best-fit HODs to determine the change in occupation number divided by the change in mass of halos with constant bo and we find a sharp peak at bo ~ 0.9 - corresponding to an average halo mass of ~ 10^12Msol/h. We thus present the following scenario: the bias of galaxies with Mr < -21.2 decreases as the Universe evolves because these galaxies form in halos of mass ~ 10^12Msol/h (independent of redshift), and the bias of these halos naturally decreases as the Universe evolves.
• A survey that can cover the sky in optical bands over wide fields to faint magnitudes with a fast cadence will enable many of the exciting science opportunities of the next decade. The Large Synoptic Survey Telescope (LSST) will have an effective aperture of 6.7 meters and an imaging camera with field of view of 9.6 deg^2, and will be devoted to a ten-year imaging survey over 20,000 deg^2 south of +15 deg. Each pointing will be imaged 2000 times with fifteen second exposures in six broad bands from 0.35 to 1.1 microns, to a total point-source depth of r~27.5. The LSST Science Book describes the basic parameters of the LSST hardware, software, and observing plans. The book discusses educational and outreach opportunities, then goes on to describe a broad range of science that LSST will revolutionize: mapping the inner and outer Solar System, stellar populations in the Milky Way and nearby galaxies, the structure of the Milky Way disk and halo and other objects in the Local Volume, transient and variable objects both at low and high redshift, and the properties of normal and active galaxies at low and high redshift. It then turns to far-field cosmological topics, exploring properties of supernovae to z~1, strong and weak lensing, the large-scale distribution of galaxies and baryon oscillations, and how these different probes may be combined to constrain cosmological models and the physics of dark energy.
• ### Halo-model Analysis of the Clustering of Photometrically Selected Galaxies from SDSS(0906.4977)

June 26, 2009 astro-ph.CO
We measure the angular 2-point correlation functions of galaxies in a volume limited, photometrically selected galaxy sample from the fifth data release of the Sloan Digital Sky Survey. We split the sample both by luminosity and galaxy type and use a halo-model analysis to find halo-occupation distributions that can simultaneously model the clustering of all, early-, and late-type galaxies in a given sample. Our results for the full galaxy sample are generally consistent with previous results using the SDSS spectroscopic sample, taking the differences between the median redshifts of the photometric and spectroscopic samples into account. We find that our early- and late- type measurements cannot be fit by a model that allows early- and late-type galaxies to be well-mixed within halos. Instead, we introduce a new model that segregates early- and late-type galaxies into separate halos to the maximum allowed extent. We determine that, in all cases, it provides a good fit to our data and thus provides a new statistical description of the manner in which early- and late-type galaxies occupy halos.
• ### A Cross-Correlation Analysis of Mg II Absorption Line Systems and Luminous Red Galaxies from the SDSS DR5(0902.4003)

May 26, 2009 astro-ph.CO
We analyze the cross-correlation of 2,705 unambiguously intervening Mg II (2796,2803A) quasar absorption line systems with 1,495,604 luminous red galaxies (LRGs) from the Fifth Data Release of the Sloan Digital Sky Survey within the redshift range 0.36<=z<=0.8. We confirm with high precision a previously reported weak anti-correlation of equivalent width and dark matter halo mass, measuring the average masses to be log M_h(M_[solar]h^-1)=11.29 [+0.36,-0.62] and log M_h(M_[solar]h^-1)=12.70 [+0.53,-1.16] for systems with W[2796A]>=1.4A and 0.8A<=W[2796A]<1.4A, respectively. Additionally, we investigate the significance of a number of potential sources of bias inherent in absorber-LRG cross-correlation measurements, including absorber velocity distributions and the weak lensing of background quasars, which we determine is capable of producing a 20-30% bias in angular cross-correlation measurements on scales less than 2'. We measure the Mg II - LRG cross-correlation for 719 absorption systems with v<60,000 km s^-1 in the quasar rest frame and find that these associated absorbers typically reside in dark matter haloes that are ~10-100 times more massive than those hosting unambiguously intervening Mg II absorbers. Furthermore, we find evidence for evolution of the redshift number density, dN/dz, with 2-sigma significance for the strongest (W>2.0A) absorbers in the DR5 sample. This width-dependent dN/dz evolution does not significantly affect the recovered equivalent width-halo mass anti-correlation and adds to existing evidence that the strongest Mg II absorption systems are correlated with an evolving population of field galaxies at z<0.8, while the non-evolving dN/dz of the weakest absorbers more closely resembles that of the LRG population.
• ### Eight-Dimensional Mid-Infrared/Optical Bayesian Quasar Selection(0810.3567)

Feb. 25, 2009 astro-ph
We explore the multidimensional, multiwavelength selection of quasars from mid-IR (MIR) plus optical data, specifically from Spitzer-IRAC and the Sloan Digital Sky Survey (SDSS). We apply modern statistical techniques to combined Spitzer MIR and SDSS optical data, allowing up to 8-D color selection of quasars. Using a Bayesian selection method, we catalog 5546 quasar candidates to an 8.0 um depth of 56 uJy over an area of ~24 sq. deg; ~70% of these candidates are not identified by applying the same Bayesian algorithm to 4-color SDSS optical data alone. Our selection recovers 97.7% of known type 1 quasars in this area and greatly improves the effectiveness of identifying 3.5<z<5 quasars. Even using only the two shortest wavelength IRAC bandpasses, it is possible to use our Bayesian techniques to select quasars with 97% completeness and as little as 10% contamination. This sample has a photometric redshift accuracy of 93.6% (Delta Z +/-0.3), remaining roughly constant when the two reddest MIR bands are excluded. While our methods are designed to find type 1 (unobscured) quasars, as many as 1200 of the objects are type 2 (obscured) quasar candidates. Coupling deep optical imaging data with deep mid-IR data could enable selection of quasars in significant numbers past the peak of the quasar luminosity function (QLF) to at least z~4. Such a sample would constrain the shape of the QLF and enable quasar clustering studies over the largest range of redshift and luminosity to date, yielding significant gains in our understanding of quasars and the evolution of galaxies.
• ### Quasar Clustering from SDSS DR5: Dependences on Physical Properties(0810.4144)

Dec. 13, 2008 astro-ph
Using a homogenous sample of 38,208 quasars with a sky coverage of $4000 {\rm deg^2}$ drawn from the SDSS Data Release Five quasar catalog, we study the dependence of quasar clustering on luminosity, virial black hole mass, quasar color, and radio loudness. At $z<2.5$, quasar clustering depends weakly on luminosity and virial black hole mass, with typical uncertainty levels $\sim 10%$ for the measured correlation lengths. These weak dependences are consistent with models in which substantial scatter between quasar luminosity, virial black hole mass and the host dark matter halo mass has diluted any clustering difference, where halo mass is assumed to be the relevant quantity that best correlates with clustering strength. However, the most luminous and most massive quasars are more strongly clustered (at the $\sim 2\sigma$ level) than the remainder of the sample, which we attribute to the rapid increase of the bias factor at the high-mass end of host halos. We do not observe a strong dependence of clustering strength on quasar colors within our sample. On the other hand, radio-loud quasars are more strongly clustered than are radio-quiet quasars matched in redshift and optical luminosity (or virial black hole mass), consistent with local observations of radio galaxies and radio-loud type 2 AGN. Thus radio-loud quasars reside in more massive and denser environments in the biased halo clustering picture. Using the Sheth et al.(2001) formula for the linear halo bias, the estimated host halo mass for radio-loud quasars is $\sim 10^{13} h^{-1}M_\odot$, compared to $\sim 2\times 10^{12} h^{-1}M_\odot$ for radio-quiet quasar hosts at $z\sim 1.5$.
• ### AGN Environments in the Sloan Digital Sky Survey I: Dependence on Type, Redshift, and Luminosity(0712.2474)

July 24, 2008 astro-ph
We explore how the local environment is related to the redshift, type, and luminosity of active galactic nuclei (AGN). Recent simulations and observations are converging on the view that the extreme luminosity of quasars is fueled in major mergers of gas-rich galaxies. In such a picture, quasars are expected to be located in regions with a higher density of galaxies on small scales where mergers are more likely to take place. However, in this picture, the activity observed in low-luminosity AGN is due to secular processes that are less dependent on the local galaxy density. To test this hypothesis, we compare the local photometric galaxy density on kiloparsec scales around spectroscopic Type I and Type II quasars to the local density around lower luminosity spectroscopic Type I and Type II AGN. To minimize projection effects and evolution in the photometric galaxy sample we use to characterize AGN environments, we place our random control sample at the same redshift as our AGN and impose a narrow redshift window around both the AGN and control targets. We find that higher luminosity AGN have more overdense environments compared to lower luminosity AGN on all scales out to our $2\Mpchseventy$ limit. Additionally, in the range $0.3\leqslant z\leqslant 0.6$, Type II quasars have similarly overdense environments to those of bright Type I quasars on all scales out to our $2\Mpchseventy$ limit, while the environment of dimmer Type I quasars appears to be less overdense than the environment of Type II quasars. We see increased overdensity for Type II AGN compared to Type I AGN on scales out to our limit of $2\Mpchseventy$ in overlapping redshift ranges. We also detect marginal evidence for evolution in the number of galaxies within $2\Mpchseventy$ of a quasar with redshift.
• ### Normalization of the Matter Power Spectrum via Higher-Order Angular Correlations of Luminous Red Galaxies(0804.3325)

April 21, 2008 astro-ph
We present a novel technique to measure $\sigma_8$, by measuring the dependence of the second-order bias of a density field on $\sigma_8$ using two separate techniques. Each technique employs area-averaged angular correlation functions ($\bar{\omega}_N$), one relying on the shape of $\bar{\omega}_2$, the other relying on the amplitude of $s_3$ ($s_3 =\bar{\omega}_3/\bar{\omega}_2^2$). We confirm the validity of the method by testing it on a mock catalog drawn from Millennium Simulation data and finding $\sigma_8^{measured}- \sigma_8^{true} = -0.002 \pm 0.062$. We create a catalog of photometrically selected LRGs from SDSS DR5 and separate it into three distinct data sets by photometric redshift, with median redshifts of 0.47, 0.53, and 0.61. Measurements of $c_2$, and $\sigma_8$ are made for each data set, assuming flat geometry and WMAP3 best-fit priors on $\Omega_m$, $h$, and $\Gamma$. We find, with increasing redshfit, $c_2 = 0.09 \pm 0.04$, $0.09 \pm 0.05$, and $0.09 \pm 0.03$ and $\sigma_8 = 0.78 \pm 0.08$, $0.80 \pm 0.09$, and $0.80 \pm 0.09$. We combine these three consistent $\sigma_8$ measurements to produce the result $\sigma_8 = 0.79 \pm 0.05$. Allowing the parameters $\Omega_m$, $h$, and $\Gamma$ to vary within their WMAP3 1$\sigma$ error, we find that the best-fit $\sigma_8$ does not change by more than 8% and we are thus confident our measurement is accurate to within 10%. We anticipate that future surveys, such as Pan-STARRS, DES, and LSST, will be able to employ this method to measure $\sigma_8$ to great precision, and will serve as an important check, complementary, on the values determined via more established methods.
• ### Developing and Deploying Advanced Algorithms to Novel Supercomputing Hardware(0711.3414)

Nov. 21, 2007 astro-ph
The objective of our research is to demonstrate the practical usage and orders of magnitude speedup of real-world applications by using alternative technologies to support high performance computing. Currently, the main barrier to the widespread adoption of this technology is the lack of development tools and case studies that typically impede non-specialists that might otherwise develop applications that could leverage these technologies. By partnering with the Innovative Systems Laboratory at the National Center for Supercomputing, we have obtained access to several novel technologies, including several Field-Programmable Gate Array (FPGA) systems, NVidia Graphics Processing Units (GPUs), and the STI Cell BE platform. Our goal is to not only demonstrate the capabilities of these systems, but to also serve as guides for others to follow in our path. To date, we have explored the efficacy of the SRC-6 MAP-C and MAP-E and SGI RASC Athena and RC100 reconfigurable computing platforms in supporting a two-point correlation function which is used in a number of different scientific domains. In a brute force test, the FPGA based single-processor system has achieved an almost two orders of magnitude speedup over a single-processor CPU system. We are now developing implementations of this algorithm on other platforms, including one using a GPU. Given the considerable efforts of the cosmology community in optimizing these classes of algorithms, we are currently working to implement an optimized version of the basic family of correlation functions by using tree-based data structures. Finally, we are also exploring other algorithms, such as instance-based classifiers, power spectrum estimators, and higher-order correlation functions that are also commonly used in a wide range of scientific disciplines.
• ### The Sloan Digital Sky Survey Quasar Lens Search. III. Constraints on Dark Energy from the Third Data Release Quasar Lens Catalog(0708.0825)

Oct. 30, 2007 astro-ph
We present cosmological results from the statistics of lensed quasars in the Sloan Digital Sky Survey (SDSS) Quasar Lens Search. By taking proper account of the selection function, we compute the expected number of quasars lensed by early-type galaxies and their image separation distribution assuming a flat universe, which is then compared with 7 lenses found in the SDSS Data Release 3 to derive constraints on dark energy under strictly controlled criteria. For a cosmological constant model (w=-1) we obtain \Omega_\Lambda=0.74^{+0.11}_{-0.15}(stat.)^{+0.13}_{-0.06}(syst.). Allowing w to be a free parameter we find \Omega_M=0.26^{+0.07}_{-0.06}(stat.)^{+0.03}_{-0.05}(syst.) and w=-1.1\pm0.6(stat.)^{+0.3}_{-0.5}(syst.) when combined with the constraint from the measurement of baryon acoustic oscillations in the SDSS luminous red galaxy sample. Our results are in good agreement with earlier lensing constraints obtained using radio lenses, and provide additional confirmation of the presence of dark energy consistent with a cosmological constant, derived independently of type Ia supernovae.
• ### The Sloan Digital Sky Survey Quasar Lens Search. II. Statistical Lens Sample from the Third Data Release(0708.0828)

Oct. 30, 2007 astro-ph
We report the first results of our systematic search for strongly lensed quasars using the spectroscopically confirmed quasars in the Sloan Digital Sky Survey (SDSS). Among 46,420 quasars from the SDSS Data Release 3 (~4188 deg^2), we select a subsample of 22,683 quasars that are located at redshifts between 0.6 and 2.2 and are brighter than the Galactic extinction corrected i-band magnitude of 19.1. We identify 220 lens candidates from the quasar subsample, for which we conduct extensive and systematic follow-up observations in optical and near-infrared wavebands, in order to construct a complete lensed quasar sample at image separations between 1'' and 20'' and flux ratios of faint to bright lensed images larger than 10^{-0.5}. We construct a statistical sample of 11 lensed quasars. Ten of these are galaxy-scale lenses with small image separations (~1''-2'') and one is a large separation (15'') system which is produced by a massive cluster of galaxies, representing the first statistical sample of lensed quasars including both galaxy- and cluster-scale lenses. The Data Release 3 spectroscopic quasars contain an additional 11 lensed quasars outside the statistical sample.
• ### Robust Machine Learning Applied to Terascale Astronomical Datasets(0710.4482)

Oct. 24, 2007 astro-ph
We present recent results from the Laboratory for Cosmological Data Mining (http://lcdm.astro.uiuc.edu) at the National Center for Supercomputing Applications (NCSA) to provide robust classifications and photometric redshifts for objects in the terascale-class Sloan Digital Sky Survey (SDSS). Through a combination of machine learning in the form of decision trees, k-nearest neighbor, and genetic algorithms, the use of supercomputing resources at NCSA, and the cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million objects in the SDSS, improved photometric redshifts, and a full exploitation of the powerful k-nearest neighbor algorithm. This work is the first to apply the full power of these algorithms to contemporary terascale astronomical datasets, and the improvement over existing results is demonstrable. We discuss issues that we have encountered in dealing with data on the terascale, and possible solutions that can be implemented to deal with upcoming petascale datasets.
• ### Quasar Clustering at $25\kpch$ from a Complete Sample of Binaries(0709.3474)

Sept. 21, 2007 astro-ph
We present spectroscopy of binary quasar candidates selected from Data Release 4 of the Sloan Digital Sky Survey (SDSS DR4) using Kernel Density Estimation (KDE). We present 27 new sets of observations, 10 of which are binary quasars, roughly doubling the number of known $g < 21$ binaries with component separations of 3 to 6". Only 3 of 49 spectroscopically identified objects are non-quasars, confirming that the quasar selection efficiency of the KDE technique is $\sim95$%. Several of our observed binaries are wide-separation lens candidates that merit additional higher-resolution observations. One interesting pair may be an M star binary, or an M star-binary quasar superposition. Our candidates are initially selected by UV-excess ($u-g < 1$), but are otherwise selected irrespective of the relative colors of the quasar pair, and we thus use them to suggest optimal color similarity and photometric redshift approaches for targeting binary quasars, or projected quasar pairs. From a sample that is complete on proper scales of $23.7 < R_{prop} < 29.7\kpch$, we determine the projected quasar correlation function to be $W_p=24.0 \pm^{16.9}_{10.8}$, which is $2\sigma$ lower than recent estimates. We argue that our low $W_p$ estimates may indicate redshift evolution in the quasar correlation function from $z\sim1.9$ to $z\sim1.4$ on scales of $R_{prop} \sim25\kpch$. The size of this evolution broadly tracks quasar clustering on larger scales, consistent with merger-driven models of quasar origin. Although our sample alone is insufficient to detect evolution in quasar clustering on small scales, an $i$-selected DR6 KDE quasar catalog, which will contain several hundred $z \leqsim 5$ binary quasars, could easily constrain any clustering evolution at $R_{prop} \sim25\kpch$.
• ### Higher-Order Angular Galaxy Correlations in the SDSS: Redshift and Color Dependence of non-Linear Bias(0704.2573)

April 19, 2007 astro-ph
We present estimates of the N-point galaxy, area-averaged, angular correlation functions $\bar{\omega}_{N}$($\theta$) for $N$ = 2,...,7 for galaxies from the fifth data release of the Sloan Digital Sky Survey. Our parent sample is selected from galaxies with $18 \leq r < 21$, and is the largest ever used to study higher-order correlations. We subdivide this parent sample into two volume limited samples using photometric redshifts, and these two samples are further subdivided by magnitude, redshift, and color (producing early- and late-type galaxy samples) to determine the dependence of $\bar{\omega}_{N}$($\theta$) on luminosity, redshift, and galaxy-type. We measure $\bar{\omega}_{N}$($\theta$) using oversampling techniques and use them to calculate the projected, $s_{N}$. Using models derived from theoretical power-spectra and perturbation theory, we measure the bias parameters $b_1$ and $c_2$, finding that the large differences in both bias parameters ($b_1$ and $c_2$) between early- and late-type galaxies are robust against changes in redshift, luminosity, and $\sigma_8$, and that both terms are consistently smaller for late-type galaxies. By directly comparing their higher-order correlation measurements, we find large differences in the clustering of late-type galaxies at redshifts lower than 0.3 and those at redshifts higher than 0.3, both at large scales ($c_2$ is larger by $\sim0.5$ at $z > 0.3$) and small scales (large amplitudes are measured at small scales only for $z > 0.3$, suggesting much more merger driven star formation at $z > 0.3$). Finally, our measurements of $c_2$ suggest both that $\sigma_8 < 0.8$ and $c_2$ is negative.
• ### The Sloan Digital Sky Survey Quasar Catalog IV. Fifth Data Release(0704.0806)

April 5, 2007 astro-ph
We present the fourth edition of the Sloan Digital Sky Survey (SDSS) Quasar Catalog. The catalog contains 77,429 objects; this is an increase of over 30,000 entries since the previous edition. The catalog consists of the objects in the SDSS Fifth Data Release that have luminosities larger than M_i = -22.0 (in a cosmology with H_0 = 70 km/s/Mpc, Omega_M = 0.3, and Omega_Lambda = 0.7) have at least one emission line with FWHM larger than 1000 km/s, or have interesting/complex absorption features, are fainter than i=15.0, and have highly reliable redshifts. The area covered by the catalog is 5740 sq. deg. The quasar redshifts range from 0.08 to 5.41, with a median value of 1.48; the catalog includes 891 quasars at redshifts greater than four, of which 36 are at redshifts greater than five. Approximately half of the catalog quasars have i < 19; nearly all have i < 21. For each object the catalog presents positions accurate to better than 0.2 arcsec. rms per coordinate, five-band (ugriz) CCD-based photometry with typical accuracy of 0.03 mag, and information on the morphology and selection method. The catalog also contains basic radio, near-infrared, and X-ray emission properties of the quasars, when available, from other large-area surveys. The calibrated digital spectra cover the wavelength region 3800--9200A at a spectral resolution of ~2000. The spectra can be retrieved from the public database using the information provided in the catalog. The average SDSS colors of quasars as a function of redshift, derived from the catalog entries, are presented in tabular form. Approximately 96% of the objects in the catalog were discovered by the SDSS.
• ### Broad Absorption Line Variability in Repeat Quasar Observations from the Sloan Digital Sky Survey(astro-ph/0610656)

Feb. 1, 2007 astro-ph
We present a time-variability analysis of 29 broad absorption line quasars (BALQSOs) observed in two epochs by the Sloan Digital Sky Survey (SDSS). These spectra are selected from a larger sample of BALQSOs with multiple observations by virtue of exhibiting a broad CIV $\lambda$1549 absorption trough separated from the rest frame of the associated emission peak by more than 3600 km s$^{-1}$. Detached troughs facilitate higher precision variability measurements, since the measurement of the absorption in these objects is not complicated by variation in the emission line flux. We have undertaken a statistical analysis of these detached-trough BALQSO spectra to explore the relationships between BAL features that are seen to vary and the dynamics of emission from the quasar central engine. We have measured variability within our sample, which includes three strongly variable BALs. We have also verified that the statistical behavior of the overall sample agrees with current model predictions and previous studies of BAL variability. Specifically, we observe that the strongest BAL variability occurs among the smallest equivalent width features and at velocities exceeding 12,000 km s$^{-1}$, as predicted by recent disk-wind modeling.
• ### Precision Measurements of Higher-Order Angular Galaxy Correlations Using 11 Million SDSS Galaxies(astro-ph/0605748)

May 31, 2006 astro-ph
We present estimates of the N-point galaxy area-averaged angular correlation functions wN for N = 2,...,7 from the third data release of the Sloan Digital Sky Survey (SDSS). The sample was selected from galaxies with 18 < r < 21, and is the largest ever used to study higher-order correlations. The measured wN are used to calculate the projected, sN, and real space, SN, hierarchical amplitudes. This produces highly-precise measurements over 0.2 to 10 h-1 Mpc, which are consistent with Gaussian primordial density fluctuations. The measurements suggest that higher-order galaxy bias is non-negligible, as defining b1 = 1 yields c2 = -0.24 +/- 0.08. We report the first SDSS measurement of marginally significant third-order bias, c3 = 0.98 +/- 0.89, which suggests that bias terms may be significant to even higher order. Previous measurements of c2 have yielded inconsistent results. Inconsistencies would be expected if different data sets sample different galaxy types, especially if different galaxy types exhibit different higher-order bias. We find early-type galaxies exhibit significantly different behavior than late-types at both small and large scales. At large scales (r > 1 h-1 Mpc), we find the SN for late-type galaxies are lower than for early-types, implying a significant difference between their higher-order bias. We find b1,early = 1.36 +/- 0.04, c2,early = 0.30 +/- 0.10, b1,late = 0.81 +/- 0.03, and c2,late = -0.70 +/- 0.08. Our results are robust against the systematic effects of reddening and seeing. The latter introduces minor structure in wN.
• ### Quasars Probing Quasars I: Optically Thick Absorbers Near Luminous Quasars(astro-ph/0603742)

March 29, 2006 astro-ph
With close pairs of quasars at different redshifts, a background quasar sightline can be used to study a foreground quasar's environment in absorption. We search 149 moderate resolution background quasar spectra, from Gemini, Keck, the MMT, and the SDSS to survey Lyman Limit Systems (LLSs) and Damped Ly-alpha systems (DLAs) in the vicinity of 1.8 < z < 4.0 luminous foreground quasars. A sample of 27 new quasar-absorber pairs is uncovered with column densities, 17.2 < log (N_HI/cm^2) < 20.9, and transverse (proper) distances of 22 kpc/h < R < 1.7 Mpc/h, from the foreground quasars. If they emit isotropically, the implied ionizing photon fluxes are a factor of ~ 5-8000 times larger than the ambient extragalactic UV background over this range of distances. The observed probability of intercepting an absorber is very high for small separations: six out of eight projected sightlines with transverse separations R < 150 kpc/h have an absorber coincident with the foreground quasar, of which four have log N_HI > 10^19. The covering factor of log N_HI > 10^19 absorbers is thus ~ 50 % (4/8) on these small scales, whereas < 2% would have been expected at random. There are many cosmological applications of these new sightlines: they provide laboratories for studying fluorescent Ly-alpha recombination radiation from LLSs, constrain the environments, emission geometry, and radiative histories of quasars, and shed light on the physical nature of LLSs and DLAs.