• ### An Ultraviolet-Optical Color-Metallicity relation for Red Clump Stars using GALEX and Gaia(1805.03236)

May 8, 2018 astro-ph.GA, astro-ph.SR
Although core helium-burning red clump (RC) stars are faint at ultraviolet wavelengths, their ultraviolet-optical color is a unique and accessible probe of their physical properties. Using data from the GALEX All Sky Imaging Survey, Gaia Data Release 2 and the SDSS APOGEE DR14 survey, we find that spectroscopic metallicity is strongly correlated with the location of an RC star in the UV-optical color magnitude diagram. The RC has a wide spread in (NUV - G)$_0$ color, over 4 magnitudes, compared to a 0.7-magnitude range in (G$_{BP}$ - G$_{RP}$)$_0$. We propose a photometric, dust-corrected, ultraviolet-optical (NUV - G)$_0$ color-metallicity [Fe/H] relation using a sample of 5,175 RC stars from APOGEE. We show that this relation has a scatter of 0.28 dex and is easier to obtain for large, wide-field samples than spectroscopic metallicities. Importantly, the effect may be comparable to the spread in RC color attributed to extinction in other studies.
• The fourth generation of the Sloan Digital Sky Survey (SDSS-IV) has been in operation since July 2014. This paper describes the second data release from this phase, and the fourteenth from SDSS overall (making this, Data Release Fourteen or DR14). This release makes public data taken by SDSS-IV in its first two years of operation (July 2014-2016). Like all previous SDSS releases, DR14 is cumulative, including the most recent reductions and calibrations of all data taken by SDSS since the first phase began operations in 2000. New in DR14 is the first public release of data from the extended Baryon Oscillation Spectroscopic Survey (eBOSS); the first data from the second phase of the Apache Point Observatory (APO) Galactic Evolution Experiment (APOGEE-2), including stellar parameter estimates from an innovative data driven machine learning algorithm known as "The Cannon"; and almost twice as many data cubes from the Mapping Nearby Galaxies at APO (MaNGA) survey as were in the previous release (N = 2812 in total). This paper describes the location and format of the publicly available data from SDSS-IV surveys. We provide references to the important technical papers describing how these data have been taken (both targeting and observation details) and processed for scientific use. The SDSS website (www.sdss.org) has been updated for this release, and provides links to data downloads, as well as tutorials and examples of data use. SDSS-IV is planning to continue to collect astronomical data until 2020, and will be followed by SDSS-V.
• ### The information content in cold stellar streams(1804.06854)

April 18, 2018 astro-ph.GA
Cold stellar streams---produced by tidal disruptions of clusters---are long-lived, coherent dynamical features in the halo of the Milky Way. Due to their different ages and different positions in phase space, different streams tell us different things about the Galaxy. Here we employ a Cramer--Rao (CRLB) or Fisher-matrix approach to understand the quantitative information content in eleven known streams (ATLAS, GD-1, Hermus, Kwando, Orinoco, PS1A, PS1C, PS1D, PS1E, Sangarius and Triangulum). This approach depends on a generative model, which we have developed previously, and which permits calculation of derivatives of predicted stream properties with respect to Galaxy and stream parameters. We find that in simple analytic models of the Milky Way, streams on eccentric orbits contain the most information about the halo shape. For each stream, there are near-degeneracies between dark-matter-halo properties and parameters of the bulge, the disk, and the stream progenitor, but simultaneous fitting of multiple streams will constrain all parameters at the percent level. At this precision, simulated dark matter halos deviate from simple analytic parametrizations, so we add an expansion of basis functions to give the gravitational potential more freedom. As freedom increases, the information about the halo reduces overall, and it becomes more localized to the current position of the stream. In the limit of high model freedom, a stellar stream appears to measure the local acceleration at its current position; this motivates thinking about future non-parametric approaches. The CRLB formalism also permits us to assess the value of future measurements of stellar velocities, distances, and proper motions. We show that kinematic measurements of stream stars are essential for producing competitive constraints on the distribution of dark matter, which bodes well for stream studies in the age of Gaia.
• ### Binary companions of evolved stars in APOGEE DR14: Search method and catalog of ~5,000 companions(1804.04662)

April 12, 2018 astro-ph.SR
Multi-epoch radial velocity measurements of stars can be used to identify stellar, sub-stellar, and planetary-mass companions. Even a small number of observation epochs can be informative about companions, though there can be multiple qualitatively different orbital solutions that fit the data. We have custom-built a Monte Carlo sampler (The Joker) that delivers reliable (and often highly multi-modal) posterior samplings for companion orbital parameters given sparse radial-velocity data. Here we use The Joker to perform a search for companions to 96,231 red-giant stars observed in the APOGEE survey (DR14) with $\geq 3$ spectroscopic epochs. We select stars with probable companions by making a cut on our posterior belief about the amplitude of the stellar radial-velocity variation induced by the orbit. We provide (1) a catalog of 320 companions for which the stellar companion properties can be confidently determined, (2) a catalog of 4,898 stars that likely have companions, but would require more observations to uniquely determine the orbital properties, and (3) posterior samplings for the full orbital parameters for all stars in the parent sample. We show the characteristics of systems with confidently determined companion properties and highlight interesting systems with candidate compact object companions.
• ### Inferring binary and trinary stellar populations in photometric and astrometric surveys(1801.08547)

March 20, 2018 astro-ph.GA, astro-ph.SR
Multiple stellar systems are ubiquitous in the Milky Way, but are often unresolved and seen as single objects in spectroscopic, photometric, and astrometric surveys. Yet, modeling them is essential for developing a full understanding of large surveys such as Gaia, and connecting them to stellar and Galactic models. In this paper we address this problem by jointly fitting the Gaia and 2MASS photometric and astrometric data using a data-driven Bayesian hierarchical model that includes populations of binary and trinary systems. This allows us to classify observations into singles, binaries, and trinaries, in a robust and efficient manner, without resorting to external models. We are able to identify multiple systems and, in some cases, make strong predictions for the properties of its unresolved stars. We will be able to compare such predictions with Gaia Data Release 4, which will contain astrometric identification and analysis of binary systems.
• ### Likelihood Non-Gaussianity in Large-Scale Structure Analyses(1803.06348)

March 16, 2018 astro-ph.CO
Standard present day large-scale structure (LSS) analyses make a major assumption in their Bayesian parameter inference --- that the likelihood has a Gaussian form. For summary statistics currently used in LSS, this assumption, even if the underlying density field is Gaussian, cannot be correct in detail. We investigate the impact of this assumption on two recent LSS analyses: the Beutler et al. (2017) power spectrum multipole ($P_\ell$) analysis and the Sinha et al. (2017) group multiplicity function ($\zeta$) analysis. Using non-parametric divergence estimators on mock catalogs originally constructed for covariance matrix estimation, we identify significant non-Gaussianity in both the $P_\ell$ and $\zeta$ likelihoods. We then use Gaussian mixture density estimation and Independent Component Analysis on the same mocks to construct likelihood estimates that approximate the true likelihood better than the Gaussian $pseudo$-likelihood. Using these likelihood estimates, we accurately estimate the true posterior probability distribution of the Beutler et al. (2017) and Sinha et al. (2017) parameters. Likelihood non-Gaussianity shifts the $f\sigma_8$ constraint by $-0.44\sigma$, but otherwise, does not significantly impact the overall parameter constraints of Beutler et al. (2017). For the $\zeta$ analysis, using the pseudo-likelihood significantly underestimates the uncertainties and biases the constraints of Sinha et al. (2017) halo occupation parameters. For $\log M_1$ and $\alpha$, the posteriors are shifted by $+0.43\sigma$ and $-0.51\sigma$ and broadened by $42\%$ and $66\%$, respectively. The divergence and likelihood estimation methods we present provide a straightforward framework for quantifying the impact of likelihood non-Gaussianity and deriving more accurate parameter constraints.
• ### Discovery and Characterization of 3000+ Main-Sequence Binaries from APOGEE Spectra(1711.08793)

Jan. 27, 2018 astro-ph.GA, astro-ph.SR
We develop a data-driven spectral model for identifying and characterizing spatially unresolved multiple-star systems and apply it to APOGEE DR13 spectra of main-sequence stars. Binaries and triples are identified as targets whose spectra can be significantly better fit by a superposition of two or three model spectra, drawn from the same isochrone, than any single-star model. From an initial sample of $\sim$20,000 main-sequence targets, we identify $\sim$2,500 binaries in which both the primary and secondary star contribute detectably to the spectrum, simultaneously fitting for the velocities and stellar parameters of both components. We additionally identify and fit $\sim$200 triple systems, as well as $\sim$700 velocity-variable systems in which the secondary does not contribute detectably to the spectrum. Our model simplifies the process of simultaneously fitting single- or multi-epoch spectra with composite models and does not depend on a velocity offset between the two components of a binary, making it sensitive to traditionally undetectable systems with periods of hundreds or thousands of years. In agreement with conventional expectations, almost all the spectrally-identified binaries with measured parallaxes fall above the main sequence in the color-magnitude diagram. We find excellent agreement between spectrally and dynamically inferred mass ratios for the $\sim$600 binaries in which a dynamical mass ratio can be measured from multi-epoch radial velocities. We obtain full orbital solutions for 64 systems, including 14 close binaries within hierarchical triples. We make available catalogs of stellar parameters, abundances, mass ratios, and orbital parameters.
• ### Data analysis recipes: Using Markov Chain Monte Carlo(1710.06068)

Markov Chain Monte Carlo (MCMC) methods for sampling probability density functions (combined with abundant computational resources) have transformed the sciences, especially in performing probabilistic inferences, or fitting models to data. In this primarily pedagogical contribution, we give a brief overview of the most basic MCMC method and some practical advice for the use of MCMC in real inference problems. We give advice on method choice, tuning for performance, methods for initialization, tests of convergence, troubleshooting, and use of the chain output to produce or report parameter estimates with associated uncertainties. We argue that autocorrelation time is the most important test for convergence, as it directly connects to the uncertainty on the sampling estimate of any quantity of interest. We emphasize that sampling is a method for doing integrals; this guides our thinking about how MCMC output is best used.
• ### A pixel-level model for event discovery in time-domain imaging(1710.02428)

Oct. 9, 2017 astro-ph.IM
Difference imaging or image subtraction is a method that measures differential photometry by matching the pointing and point-spread function (PSF) between image frames. It is used for the detection of time-variable phenomena. Here we present a new category of method---CPM Difference Imaging, in which differences are not measured between matched images but instead between image frames and a data-driven predictive model that has been designed only to predict the pointing, PSF, and detector effects but not astronomical variability. In CPM Difference Imaging each pixel is modelled by the Causal Pixel Model (CPM) originally built for modeling Kepler data, in which pixel values are predicted by a linear combination of other pixels at the same epoch but far enough away such that these pixels are causally disconnected, astrophysically. It does not require that the user have any explicit model or description of the pointing or point-spread function of any of the images. Its principal drawback is that---in its current form---it requires an imaging campaign with many epochs and fairly stable telescope pointing. The method is applied to simulated data and also the K2 Campaign 9 microlensing data. We show that CPM Difference Imaging can detect variable objects and produce precise differentiate photometry in a crowded field. CPM Difference Imaging is capable of producing image differences at nearly photon-noise precision.
• The fourth generation of the Sloan Digital Sky Survey (SDSS-IV) began observations in July 2014. It pursues three core programs: APOGEE-2, MaNGA, and eBOSS. In addition, eBOSS contains two major subprograms: TDSS and SPIDERS. This paper describes the first data release from SDSS-IV, Data Release 13 (DR13), which contains new data, reanalysis of existing data sets and, like all SDSS data releases, is inclusive of previously released data. DR13 makes publicly available 1390 spatially resolved integral field unit observations of nearby galaxies from MaNGA, the first data released from this survey. It includes new observations from eBOSS, completing SEQUELS. In addition to targeting galaxies and quasars, SEQUELS also targeted variability-selected objects from TDSS and X-ray selected objects from SPIDERS. DR13 includes new reductions of the SDSS-III BOSS data, improving the spectrophotometric calibration and redshift classification. DR13 releases new reductions of the APOGEE-1 data from SDSS-III, with abundances of elements not previously included and improved stellar parameters for dwarf stars and cooler stars. For the SDSS imaging data, DR13 provides new, more robust and precise photometric calibrations. Several value-added catalogs are being released in tandem with DR13, in particular target catalogs relevant for eBOSS, TDSS, and SPIDERS, and an updated red-clump catalog for APOGEE. This paper describes the location and format of the data now publicly available, as well as providing references to the important technical papers that describe the targeting, observing, and data reduction. The SDSS website, http://www.sdss.org, provides links to the data, tutorials and examples of data access, and extensive documentation of the reduction and analysis procedures. DR13 is the first of a scheduled set that will contain new data and analyses from the planned ~6-year operations of SDSS-IV.
• ### Kronos & Krios: Evidence for accretion of a massive, rocky planetary system in a comoving pair of solar-type stars(1709.05344)

Sept. 15, 2017 astro-ph.SR
We report and discuss the discovery of a comoving pair of bright solar-type stars, HD 240430 and HD 240429, with a significant difference in their chemical abundances. The two stars have an estimated 3D separation of $\approx 0.6$ pc ($\approx 0.01$ pc projected) at a distance of $r\approx 100$ pc with nearly identical three-dimensional velocities, as inferred from Gaia TGAS parallaxes and proper motions, and high-precision radial velocity measurements. Stellar parameters determined from high-resolution Keck HIRES spectra indicate that both stars are $\sim 4$ Gyr old. The more metal-rich of the two, HD 240430, shows an enhancement of refractory ($T_C>1200$ K) elements by $\approx 0.2$ dex and a marginal enhancement of (moderately) volatile elements ($T_C<1200$ K, C, N, O, Na, and Mn). This is the largest metallicity difference found in a wide binary pair yet. Additionally, HD 240430 shows an anomalously high surface lithium abundance ($A(\mathrm{Li})=2.75$), higher than its companion by $0.5$ dex. The proximity in phase-space and ages between the two stars suggests that they formed together with the same composition, at odds with the observed differences in metallicity and abundance patterns. We therefore suggest that the star HD~240430, "Kronos", accreted 15 $M_\oplus$ of rocky material after birth, selectively enhancing the refractory elements as well as lithium in its surface and convective envelope.
• ### Red clump stars and Gaia: Calibration of the standard candle using a hierarchical probabilistic model(1705.08988)

June 29, 2017 astro-ph.GA, astro-ph.SR
Distances to individual stars in our own Galaxy are critical in order to piece together the nature of its velocity and spatial structure. Core helium burning red clump (RC) stars have similar luminosities, are abundant throughout the Galaxy, and thus constitute good standard candles. We build a hierarchical probabilistic model to quantify the quality of RC stars as standard candles using parallax measurements from the first Gaia data release. A unique aspect of our methodology is to fully account for (and marginalize over) parallax, photometry, and dust corrections uncertainties, which leads to more robust results than standard approaches. We determine the absolute magnitude and intrinsic dispersion of the RC in 2MASS bands J, H, Ks, Gaia G band, and WISE bands W1, W2, W3, and W4. We find that the absolute magnitude of the RC is $-1.61 \pm$ 0.01 (in Ks), $+0.44 \pm$ 0.01 (in G) , $-0.93 \pm$ 0.01 (in J), $-1.46 \pm$ 0.01 (in H), $-1.68 \pm$ 0.02 (in W1), $-1.69\pm$ 0.02 (in W2), $-1.67 \pm$ 0.02 (in W3), $1.76 \pm$ 0.01 mag (in W4). The mean intrinsic dispersion is $\sim 0.17 \pm$ 0.03 mag across all bands (yielding a typical distance precision of $\sim$ 8%). Thus RC stars are reliable and precise standard candles. In addition, we have also re-calibrated the zero point of the absolute magnitude of the RC in each band, which provide a benchmark for future studies to estimate distances to RC stars. Finally, the parallax error shrinkage in the hierarchical model outlined in this work can be used to obtain more precise parallaxes than Gaia for the most distant RC stars across the Galaxy.
• We describe the Sloan Digital Sky Survey IV (SDSS-IV), a project encompassing three major spectroscopic programs. The Apache Point Observatory Galactic Evolution Experiment 2 (APOGEE-2) is observing hundreds of thousands of Milky Way stars at high resolution and high signal-to-noise ratio in the near-infrared. The Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey is obtaining spatially-resolved spectroscopy for thousands of nearby galaxies (median redshift of z = 0.03). The extended Baryon Oscillation Spectroscopic Survey (eBOSS) is mapping the galaxy, quasar, and neutral gas distributions between redshifts z = 0.6 and 3.5 to constrain cosmology using baryon acoustic oscillations, redshift space distortions, and the shape of the power spectrum. Within eBOSS, we are conducting two major subprograms: the SPectroscopic IDentification of eROSITA Sources (SPIDERS), investigating X-ray AGN and galaxies in X-ray clusters, and the Time Domain Spectroscopic Survey (TDSS), obtaining spectra of variable sources. All programs use the 2.5-meter Sloan Foundation Telescope at Apache Point Observatory; observations there began in Summer 2014. APOGEE-2 also operates a second near-infrared spectrograph at the 2.5-meter du Pont Telescope at Las Campanas Observatory, with observations beginning in early 2017. Observations at both facilities are scheduled to continue through 2020. In keeping with previous SDSS policy, SDSS-IV provides regularly scheduled public data releases; the first one, Data Release 13, was made available in July 2016.
• ### Improving \textsl{Gaia} parallax precision with a data-driven model of stars(1706.05055)

June 15, 2017 astro-ph.GA
Converting a noisy parallax measurement into a posterior belief over distance requires inference with a prior. Usually this prior represents beliefs about the stellar density distribution of the Milky Way. However, multi-band photometry exists for a large fraction of the \textsl{\small{Gaia}} \textsl{\small{TGAS}} Catalog and is incredibly informative about stellar distances. Here we use \textsl{\small{2MASS}} colors for 1.4 million \textsl{\small{TGAS}} stars to build a noise-deconvolved empirical prior distribution for stars in color--magnitude space. This model contains no knowledge of stellar astrophysics or the Milky Way, but is precise because it accurately generates a large number of noisy parallax measurements under an assumption of stationarity; that is, it is capable of combining the information from many stars. We use the Extreme Deconvolution (\textsl{\small{XD}}) algorithm---an Empirical Bayes approximation to a full hierarchical model of the true parallax and photometry of every star---to construct this prior. The prior is combined with a \textsl{\small{TGAS}} likelihood to infer a precise photometric parallax estimate and uncertainty (and full posterior) for every star. Our parallax estimates are more precise than the \textsl{\small{TGAS}} catalog entries by a median factor of 1.2 (14% are more precise by a factor >2) and are more precise than previous Bayesian distance estimates that use spatial priors. We validate our parallax inferences using members of the Milky Way star cluster M67, which is not visible as a cluster in the \textsl{\small{TGAS}} parallax estimates, but appears as a cluster in our posterior parallax estimates. Our results, including a parallax posterior pdf for each of 1.4 million \textsl{\small{TGAS}} stars, are available in companion electronic tables.
• ### Exploring cosmic homogeneity with the BOSS DR12 galaxy sample(1702.02159)

June 1, 2017 astro-ph.CO
In this study, we probe the transition to cosmic homogeneity in the Large Scale Structure (LSS) of the Universe using the CMASS galaxy sample of BOSS spectroscopic survey which covers the largest effective volume to date, $3\ h^{-3}\ \mathrm{Gpc}^3$ at $0.43 \leq z \leq 0.7$. We study the scaled counts-in-spheres, $\mathcal{N}(<r)$, and the fractal correlation dimension, $\mathcal{D}_2(r)$, to assess the homogeneity scale of the universe using a $Landy\ \&\ Szalay$ inspired estimator. Defining the scale of transition to homogeneity as the scale at which $\mathcal{D}_2(r)$ reaches 3 within $1\%$, i.e. $\mathcal{D}_2(r)>2.97$ for $r>\mathcal{R}_H$, we find $\mathcal{R}_H = (63.3\pm0.7) \ h^{-1}\ \mathrm{Mpc}$, in agreement at the percentage level with the predictions of the $\Lambda$CDM model $\mathcal{R}_H=62.0\ h^{-1}\ \mathrm{Mpc}$. Thanks to the large cosmic depth of the survey, we investigate the redshift evolution of the transition to homogeneity scale and find agreement with the $\Lambda$CDM prediction. Finally, we find that $\mathcal{D}_2$ is compatible with $3$ at scales larger than $300\ h^{-1}\$Mpc in all redshift bins. These results consolidate the Cosmological Principle and represent a precise consistency test of the $\Lambda CDM$ model.
• ### Comoving stars in Gaia DR1: An abundance of very wide separation co-moving pairs(1612.02440)

June 1, 2017 astro-ph.GA, astro-ph.SR
The primary sample of the {\it Gaia} Data Release 1 is the Tycho-Gaia Astrometric Solution (TGAS): $\approx$ 2 million Tycho-2 sources with improved parallaxes and proper motions relative to the initial catalog. This increased astrometric precision presents an opportunity to find new binary stars and moving groups. We search for high-confidence comoving pairs of stars in TGAS by identifying pairs of stars consistent with having the same 3D velocity using a marginalized likelihood ratio test to discriminate candidate comoving pairs from the field population. Although we perform some visualizations using (bias- corrected) inverse parallax as a point estimate of distance, the likelihood ratio is computed with a probabilistic model that includes the covariances of parallax and proper motions and marginalizes the (unknown) true distances and 3D velocities of the stars. We find 13,085 comoving star pairs among 10,606 unique stars with separations as large as 10 pc (our search limit). Some of these pairs form larger groups through mutual comoving neighbors: many of these pair networks correspond to known open clusters and OB associations, but we also report the discovery of several new comoving groups. Most surprisingly, we find a large number of very wide ($>1$ pc) separation comoving star pairs, the number of which increases with increasing separation and cannot be explained purely by false-positive contamination. Our key result is a catalog of high-confidence comoving pairs of stars in TGAS. We discuss the utility of this catalog for making dynamical inferences about the Galaxy, testing stellar atmosphere models, and validating chemical abundance measurements.
• ### Approximate Bayesian Computation in Large Scale Structure: constraining the galaxy-halo connection(1607.01782)

April 10, 2017 astro-ph.CO
Standard approaches to Bayesian parameter inference in large scale structure assume a Gaussian functional form (chi-squared form) for the likelihood. This assumption, in detail, cannot be correct. Likelihood free inferences such as Approximate Bayesian Computation (ABC) relax these restrictions and make inference possible without making any assumptions on the likelihood. Instead ABC relies on a forward generative model of the data and a metric for measuring the distance between the model and data. In this work, we demonstrate that ABC is feasible for LSS parameter inference by using it to constrain parameters of the halo occupation distribution (HOD) model for populating dark matter halos with galaxies. Using specific implementation of ABC supplemented with Population Monte Carlo importance sampling, a generative forward model using HOD, and a distance metric based on galaxy number density, two-point correlation function, and galaxy group multiplicity function, we constrain the HOD parameters of mock observation generated from selected "true" HOD parameters. The parameter constraints we obtain from ABC are consistent with the "true" HOD parameters, demonstrating that ABC can be reliably used for parameter inference in LSS. Furthermore, we compare our ABC constraints to constraints we obtain using a pseudo-likelihood function of Gaussian form with MCMC and find consistent HOD parameter constraints. Ultimately our results suggest that ABC can and should be applied in parameter inference for LSS analyses.
• ### The Joker: A custom Monte Carlo sampler for binary-star and exoplanet radial velocity data(1610.07602)

March 29, 2017 astro-ph.SR, astro-ph.EP
Given sparse or low-quality radial-velocity measurements of a star, there are often many qualitatively different stellar or exoplanet companion orbit models that are consistent with the data. The consequent multimodality of the likelihood function leads to extremely challenging search, optimization, and MCMC posterior sampling over the orbital parameters. Here we create a custom Monte Carlo sampler for sparse or noisy radial-velocity measurements of two-body systems that can produce posterior samples for orbital parameters even when the likelihood function is poorly behaved. The six standard orbital parameters for a binary system can be split into four non-linear parameters (period, eccentricity, argument of pericenter, phase) and two linear parameters (velocity amplitude, barycenter velocity). We capitalize on this by building a sampling method in which we densely sample the prior pdf in the non-linear parameters and perform rejection sampling using a likelihood function marginalized over the linear parameters. With sparse or uninformative data, the sampling obtained by this rejection sampling is generally multimodal and dense. With informative data, the sampling becomes effectively unimodal but too sparse: in these cases we follow the rejection sampling with standard MCMC. The method produces correct samplings in orbital parameters for data that include as few as three epochs. The Joker can therefore be used to produce proper samplings of multimodal pdfs, which are still informative and can be used in hierarchical (population) modeling. We give some examples that show how the posterior pdf depends sensitively on the number and time coverage of the observations and their uncertainties.
• ### Hierarchical probabilistic inference of the color-magnitude diagram and shrinkage of stellar distance uncertainties(1703.08112)

We present a hierarchical probabilistic model for improving geometric stellar distance estimates using color--magnitude information. This is achieved with a data driven model of the color--magnitude diagram, not relying on stellar models but instead on the relative abundances of stars in color--magnitude cells, which are inferred from very noisy magnitudes and parallaxes. While the resulting noise-deconvolved color--magnitude diagram can be useful for a range of applications, we focus on deriving improved stellar distance estimates relying on both parallax and photometric information. We demonstrate the efficiency of this approach on the 1.4 million stars of the Gaia TGAS sample that also have APASS magnitudes. Our hierarchical model has 4~million parameters in total, most of which are marginalized out numerically or analytically. We find that distance estimates are significantly improved for the noisiest parallaxes and densest regions of the color--magnitude diagram. In particular, the average distance signal-to-noise ratio and uncertainty improve by 19~percent and 36~percent, respectively, with 8~percent of the objects improving in SNR by a factor greater than 2. This computationally efficient approach fully accounts for both parallax and photometric noise, and is a first step towards a full hierarchical probabilistic model of the Gaia data.
• ### Galactic Doppelganger: The chemical similarity among field stars and among stars with a common birth origin(1701.07829)

Jan. 26, 2017 astro-ph.GA, astro-ph.SR
We explore to which extent stars within Galactic disk open clusters resemble each other in the high-dimensional space of their photospheric element abundances, and contrast this with pairs of field stars. Our analysis is based on abundances for 20 elements, homogeneously derived from APOGEE spectra (with carefully quantified uncertainties, with a median value of $\sim 0.03$ dex). We consider 90 red giant stars in seven open clusters and find that most stars within a cluster have abundances in most elements that are indistinguishable (in a $\chi^2$-sense) from those of the other members, as expected for stellar birth siblings. An analogous analysis among pairs of $>1000$ field stars shows that highly significant abundance differences in the 20-dimensional space can be established for the vast majority of these pairs, and that the APOGEE-based abundance measurements have high discriminating power. However, pairs of field stars whose abundances are indistinguishable even at 0.03~dex precision exist: $\sim 0.3$ percent of all field star pairs, and $\sim 1.0$ percent of field star pairs at the same (solar) metallicity [Fe/H] = $0 \pm 0.02$. Most of these pairs are presumably not birth siblings from the same cluster, but rather doppelganger. Our analysis implies that 'chemical tagging' in the strict sense, identifying birth siblings for typical disk stars through their abundance similarity alone, will not work with such data. However, our approach shows that abundances have extremely valuable information for probabilistic chemo-orbital modeling and combined with velocities, we have identified new cluster members from the field.
• ### Label Transfer from APOGEE to LAMOST: Precise Stellar Parameters for 450,000 LAMOST Giants(1602.00303)

Jan. 14, 2017 astro-ph.GA, astro-ph.SR
In this era of large-scale stellar spectroscopic surveys, measurements of stellar attributes ("labels," i.e. parameters and abundances) must be made precise and consistent across surveys. Here, we demonstrate that this can be achieved by a data-driven approach to spectral modeling. With The Cannon, we transfer information from the APOGEE survey to determine precise Teff, log g, [Fe/H], and [$\alpha$/M] from the spectra of 450,000 LAMOST giants. The Cannon fits a predictive model for LAMOST spectra using 9952 stars observed in common between the two surveys, taking five labels from APOGEE DR12 as ground truth: Teff, log g, [Fe/H], [\alpha/M], and K-band extinction $A_k$. The model is then used to infer Teff, log g, [Fe/H], and [$\alpha$/M] for 454,180 giants, 20% of the LAMOST DR2 stellar sample. These are the first [$\alpha$/M] values for the full set of LAMOST giants, and the largest catalog of [$\alpha$/M] for giant stars to date. Furthermore, these labels are by construction on the APOGEE label scale; for spectra with S/N > 50, cross-validation of the model yields typical uncertainties of 70K in Teff, 0.1 in log g, 0.1 in [Fe/H], and 0.04 in [$\alpha$/M], values comparable to the broadly stated, conservative APOGEE DR12 uncertainties. Thus, by using "label transfer" to tie low-resolution (LAMOST R $\sim$ 1800) spectra to the label scale of a much higher-resolution (APOGEE R $\sim$ 22,500) survey, we substantially reduce the inconsistencies between labels measured by the individual survey pipelines. This demonstrates that label transfer with The Cannon can successfully bring different surveys onto the same physical scale.
• ### Data-driven, interpretable photometric redshifts trained on heterogeneous and unrepresentative data(1612.00847)

Dec. 2, 2016 astro-ph.CO
We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine-learning and template-fitting methods by building template SEDs directly from the training data. This is made computationally tractable with Gaussian Processes operating in flux--redshift space, encoding the physics of redshift and the projection of galaxy SEDs onto photometric band passes. This method alleviates the need of acquiring representative training data or constructing detailed galaxy SED models; it requires only that the photometric band passes and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the $i$-magnitude-selected, spectroscopically-confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes, or to simulate populations of galaxies with realistic fluxes and redshifts, for example. This method opens a new era in which photometric redshifts for large photometric surveys are derived using a flexible yet physical model of the data trained on all available surveys (spectroscopic and photometric).
• ### A 14 $h^{-3}$ Gpc$^3$ study of cosmic homogeneity using BOSS DR12 quasar sample(1602.09010)

Nov. 21, 2016 astro-ph.CO
The BOSS quasar sample is used to study cosmic homogeneity with a 3D survey in the redshift range $2.2<z<2.8$. We measure the count-in-sphere, $N(<\! r)$, i.e. the average number of objects around a given object, and its logarithmic derivative, the fractal correlation dimension, $D_2(r)$. For a homogeneous distribution $N(<\! r) \propto r^3$ and $D_2(r)=3$. Due to the uncertainty on tracer density evolution, 3D surveys can only probe homogeneity up to a redshift dependence, i.e. they probe so-called "spatial isotropy". Our data demonstrate spatial isotropy of the quasar distribution in the redshift range $2.2<z<2.8$ in a model-independent way, independent of any FLRW fiducial cosmology, resulting in $3-\langle D_2 \rangle < 1.7 \times 10^{-3}$ (2 $\sigma$) over the range $250<r<1200 \, h^{-1}$Mpc for the quasar distribution. If we assume that quasars do not have a bias much less than unity, this implies spatial isotropy of the matter distribution on large scales. Then, combining with the Copernican principle, we finally get homogeneity of the matter distribution on large scales. Alternatively, using a flat $\Lambda$CDM fiducial cosmology with CMB-derived parameters, and measuring the quasar bias relative to this $\Lambda$CDM model, our data provide a consistency check of the model, in terms of how homogeneous the Universe is on different scales. $D_2(r)$ is found to be compatible with our $\Lambda$CDM model on the whole $10<r<1200 \, h^{-1}$Mpc range. For the matter distribution we obtain $3-\langle D_2 \rangle < 5 \times 10^{-5}$ (2 $\sigma$) over the range $250<r<1200 \, h^{-1}$Mpc, consistent with homogeneity on large scales.
• ### Do fast stellar centroiding methods saturate the Cram\'{e}r-Rao lower bound?(1610.05873)

Oct. 19, 2016 astro-ph.IM
One of the most demanding tasks in astronomical image processing---in terms of precision---is the centroiding of stars. Upcoming large surveys are going to take images of billions of point sources, including many faint stars, with short exposure times. Real-time estimation of the centroids of stars is crucial for real-time PSF estimation, and maximal precision is required for measurements of proper motion. The fundamental Cram\'{e}r-Rao lower bound sets a limit on the root-mean-squared-error achievable by optimal estimators. In this work, we aim to compare the performance of various centroiding methods, in terms of saturating the bound, when they are applied to relatively low signal-to-noise ratio unsaturated stars assuming zero-mean constant Gaussian noise. In order to make this comparison, we present the ratio of the root-mean-squared-errors of these estimators to their corresponding Cram\'{e}r-Rao bound as a function of the signal-to-noise ratio and the full-width at half-maximum of faint stars. We discuss two general circumstances in centroiding of faint stars: (i) when we have a good estimate of the PSF, (ii) when we do not know the PSF. In the case that we know the PSF, we show that a fast polynomial centroiding after smoothing the image by the PSF can be as efficient as the maximum-likelihood estimator at saturating the bound. In the case that we do not know the PSF, we demonstrate that although polynomial centroiding is not as optimal as PSF profile fitting, it comes very close to saturating the Cram\'{e}r-Rao lower bound in a wide range of conditions. We also show that the moment-based method of center-of-light never comes close to saturating the bound, and thus it does not deliver reliable estimates of centroids.
• ### The population of long-period transiting exoplanets(1607.08237)

Oct. 6, 2016 astro-ph.EP, astro-ph.IM
The Kepler Mission has discovered thousands of exoplanets and revolutionized our understanding of their population. This large, homogeneous catalog of discoveries has enabled rigorous studies of the occurrence rate of exoplanets and planetary systems as a function of their physical properties. However, transit surveys like Kepler are most sensitive to planets with orbital periods much shorter than the orbital periods of Jupiter and Saturn, the most massive planets in our Solar System. To address this deficiency, we perform a fully automated search for long-period exoplanets with only one or two transits in the archival Kepler light curves. When applied to the $\sim 40,000$ brightest Sun-like target stars, this search produces 16 long-period exoplanet candidates. Of these candidates, 6 are novel discoveries and 5 are in systems with inner short-period transiting planets. Since our method involves no human intervention, we empirically characterize the detection efficiency of our search. Based on these results, we measure the average occurrence rate of exoplanets smaller than Jupiter with orbital periods in the range 2-25 years to be $2.0\pm0.7$ planets per Sun-like star.