
Although core heliumburning red clump (RC) stars are faint at ultraviolet
wavelengths, their ultravioletoptical color is a unique and accessible probe
of their physical properties. Using data from the GALEX All Sky Imaging Survey,
Gaia Data Release 2 and the SDSS APOGEE DR14 survey, we find that spectroscopic
metallicity is strongly correlated with the location of an RC star in the
UVoptical color magnitude diagram. The RC has a wide spread in (NUV  G)$_0$
color, over 4 magnitudes, compared to a 0.7magnitude range in (G$_{BP}$ 
G$_{RP}$)$_0$. We propose a photometric, dustcorrected, ultravioletoptical
(NUV  G)$_0$ colormetallicity [Fe/H] relation using a sample of 5,175 RC
stars from APOGEE. We show that this relation has a scatter of 0.28 dex and is
easier to obtain for large, widefield samples than spectroscopic
metallicities. Importantly, the effect may be comparable to the spread in RC
color attributed to extinction in other studies.

The fourth generation of the Sloan Digital Sky Survey (SDSSIV) has been in
operation since July 2014. This paper describes the second data release from
this phase, and the fourteenth from SDSS overall (making this, Data Release
Fourteen or DR14). This release makes public data taken by SDSSIV in its first
two years of operation (July 20142016). Like all previous SDSS releases, DR14
is cumulative, including the most recent reductions and calibrations of all
data taken by SDSS since the first phase began operations in 2000. New in DR14
is the first public release of data from the extended Baryon Oscillation
Spectroscopic Survey (eBOSS); the first data from the second phase of the
Apache Point Observatory (APO) Galactic Evolution Experiment (APOGEE2),
including stellar parameter estimates from an innovative data driven machine
learning algorithm known as "The Cannon"; and almost twice as many data cubes
from the Mapping Nearby Galaxies at APO (MaNGA) survey as were in the previous
release (N = 2812 in total). This paper describes the location and format of
the publicly available data from SDSSIV surveys. We provide references to the
important technical papers describing how these data have been taken (both
targeting and observation details) and processed for scientific use. The SDSS
website (www.sdss.org) has been updated for this release, and provides links to
data downloads, as well as tutorials and examples of data use. SDSSIV is
planning to continue to collect astronomical data until 2020, and will be
followed by SDSSV.

Cold stellar streamsproduced by tidal disruptions of clustersare
longlived, coherent dynamical features in the halo of the Milky Way. Due to
their different ages and different positions in phase space, different streams
tell us different things about the Galaxy. Here we employ a CramerRao (CRLB)
or Fishermatrix approach to understand the quantitative information content in
eleven known streams (ATLAS, GD1, Hermus, Kwando, Orinoco, PS1A, PS1C, PS1D,
PS1E, Sangarius and Triangulum). This approach depends on a generative model,
which we have developed previously, and which permits calculation of
derivatives of predicted stream properties with respect to Galaxy and stream
parameters. We find that in simple analytic models of the Milky Way, streams on
eccentric orbits contain the most information about the halo shape. For each
stream, there are neardegeneracies between darkmatterhalo properties and
parameters of the bulge, the disk, and the stream progenitor, but simultaneous
fitting of multiple streams will constrain all parameters at the percent level.
At this precision, simulated dark matter halos deviate from simple analytic
parametrizations, so we add an expansion of basis functions to give the
gravitational potential more freedom. As freedom increases, the information
about the halo reduces overall, and it becomes more localized to the current
position of the stream. In the limit of high model freedom, a stellar stream
appears to measure the local acceleration at its current position; this
motivates thinking about future nonparametric approaches. The CRLB formalism
also permits us to assess the value of future measurements of stellar
velocities, distances, and proper motions. We show that kinematic measurements
of stream stars are essential for producing competitive constraints on the
distribution of dark matter, which bodes well for stream studies in the age of
Gaia.

Multiepoch radial velocity measurements of stars can be used to identify
stellar, substellar, and planetarymass companions. Even a small number of
observation epochs can be informative about companions, though there can be
multiple qualitatively different orbital solutions that fit the data. We have
custombuilt a Monte Carlo sampler (The Joker) that delivers reliable (and
often highly multimodal) posterior samplings for companion orbital parameters
given sparse radialvelocity data. Here we use The Joker to perform a search
for companions to 96,231 redgiant stars observed in the APOGEE survey (DR14)
with $\geq 3$ spectroscopic epochs. We select stars with probable companions by
making a cut on our posterior belief about the amplitude of the stellar
radialvelocity variation induced by the orbit. We provide (1) a catalog of 320
companions for which the stellar companion properties can be confidently
determined, (2) a catalog of 4,898 stars that likely have companions, but would
require more observations to uniquely determine the orbital properties, and (3)
posterior samplings for the full orbital parameters for all stars in the parent
sample. We show the characteristics of systems with confidently determined
companion properties and highlight interesting systems with candidate compact
object companions.

Multiple stellar systems are ubiquitous in the Milky Way, but are often
unresolved and seen as single objects in spectroscopic, photometric, and
astrometric surveys. Yet, modeling them is essential for developing a full
understanding of large surveys such as Gaia, and connecting them to stellar and
Galactic models. In this paper we address this problem by jointly fitting the
Gaia and 2MASS photometric and astrometric data using a datadriven Bayesian
hierarchical model that includes populations of binary and trinary systems.
This allows us to classify observations into singles, binaries, and trinaries,
in a robust and efficient manner, without resorting to external models. We are
able to identify multiple systems and, in some cases, make strong predictions
for the properties of its unresolved stars. We will be able to compare such
predictions with Gaia Data Release 4, which will contain astrometric
identification and analysis of binary systems.

Standard present day largescale structure (LSS) analyses make a major
assumption in their Bayesian parameter inference  that the likelihood has a
Gaussian form. For summary statistics currently used in LSS, this assumption,
even if the underlying density field is Gaussian, cannot be correct in detail.
We investigate the impact of this assumption on two recent LSS analyses: the
Beutler et al. (2017) power spectrum multipole ($P_\ell$) analysis and the
Sinha et al. (2017) group multiplicity function ($\zeta$) analysis. Using
nonparametric divergence estimators on mock catalogs originally constructed
for covariance matrix estimation, we identify significant nonGaussianity in
both the $P_\ell$ and $\zeta$ likelihoods. We then use Gaussian mixture density
estimation and Independent Component Analysis on the same mocks to construct
likelihood estimates that approximate the true likelihood better than the
Gaussian $pseudo$likelihood. Using these likelihood estimates, we accurately
estimate the true posterior probability distribution of the Beutler et al.
(2017) and Sinha et al. (2017) parameters. Likelihood nonGaussianity shifts
the $f\sigma_8$ constraint by $0.44\sigma$, but otherwise, does not
significantly impact the overall parameter constraints of Beutler et al.
(2017). For the $\zeta$ analysis, using the pseudolikelihood significantly
underestimates the uncertainties and biases the constraints of Sinha et al.
(2017) halo occupation parameters. For $\log M_1$ and $\alpha$, the posteriors
are shifted by $+0.43\sigma$ and $0.51\sigma$ and broadened by $42\%$ and
$66\%$, respectively. The divergence and likelihood estimation methods we
present provide a straightforward framework for quantifying the impact of
likelihood nonGaussianity and deriving more accurate parameter constraints.

We develop a datadriven spectral model for identifying and characterizing
spatially unresolved multiplestar systems and apply it to APOGEE DR13 spectra
of mainsequence stars. Binaries and triples are identified as targets whose
spectra can be significantly better fit by a superposition of two or three
model spectra, drawn from the same isochrone, than any singlestar model. From
an initial sample of $\sim$20,000 mainsequence targets, we identify
$\sim$2,500 binaries in which both the primary and secondary star contribute
detectably to the spectrum, simultaneously fitting for the velocities and
stellar parameters of both components. We additionally identify and fit
$\sim$200 triple systems, as well as $\sim$700 velocityvariable systems in
which the secondary does not contribute detectably to the spectrum. Our model
simplifies the process of simultaneously fitting single or multiepoch spectra
with composite models and does not depend on a velocity offset between the two
components of a binary, making it sensitive to traditionally undetectable
systems with periods of hundreds or thousands of years. In agreement with
conventional expectations, almost all the spectrallyidentified binaries with
measured parallaxes fall above the main sequence in the colormagnitude
diagram. We find excellent agreement between spectrally and dynamically
inferred mass ratios for the $\sim$600 binaries in which a dynamical mass ratio
can be measured from multiepoch radial velocities. We obtain full orbital
solutions for 64 systems, including 14 close binaries within hierarchical
triples. We make available catalogs of stellar parameters, abundances, mass
ratios, and orbital parameters.

Markov Chain Monte Carlo (MCMC) methods for sampling probability density
functions (combined with abundant computational resources) have transformed the
sciences, especially in performing probabilistic inferences, or fitting models
to data. In this primarily pedagogical contribution, we give a brief overview
of the most basic MCMC method and some practical advice for the use of MCMC in
real inference problems. We give advice on method choice, tuning for
performance, methods for initialization, tests of convergence, troubleshooting,
and use of the chain output to produce or report parameter estimates with
associated uncertainties. We argue that autocorrelation time is the most
important test for convergence, as it directly connects to the uncertainty on
the sampling estimate of any quantity of interest. We emphasize that sampling
is a method for doing integrals; this guides our thinking about how MCMC output
is best used.

Difference imaging or image subtraction is a method that measures
differential photometry by matching the pointing and pointspread function
(PSF) between image frames. It is used for the detection of timevariable
phenomena. Here we present a new category of methodCPM Difference Imaging,
in which differences are not measured between matched images but instead
between image frames and a datadriven predictive model that has been designed
only to predict the pointing, PSF, and detector effects but not astronomical
variability. In CPM Difference Imaging each pixel is modelled by the Causal
Pixel Model (CPM) originally built for modeling Kepler data, in which pixel
values are predicted by a linear combination of other pixels at the same epoch
but far enough away such that these pixels are causally disconnected,
astrophysically. It does not require that the user have any explicit model or
description of the pointing or pointspread function of any of the images. Its
principal drawback is thatin its current formit requires an imaging
campaign with many epochs and fairly stable telescope pointing. The method is
applied to simulated data and also the K2 Campaign 9 microlensing data. We show
that CPM Difference Imaging can detect variable objects and produce precise
differentiate photometry in a crowded field. CPM Difference Imaging is capable
of producing image differences at nearly photonnoise precision.

The fourth generation of the Sloan Digital Sky Survey (SDSSIV) began
observations in July 2014. It pursues three core programs: APOGEE2, MaNGA, and
eBOSS. In addition, eBOSS contains two major subprograms: TDSS and SPIDERS.
This paper describes the first data release from SDSSIV, Data Release 13
(DR13), which contains new data, reanalysis of existing data sets and, like all
SDSS data releases, is inclusive of previously released data. DR13 makes
publicly available 1390 spatially resolved integral field unit observations of
nearby galaxies from MaNGA, the first data released from this survey. It
includes new observations from eBOSS, completing SEQUELS. In addition to
targeting galaxies and quasars, SEQUELS also targeted variabilityselected
objects from TDSS and Xray selected objects from SPIDERS. DR13 includes new
reductions of the SDSSIII BOSS data, improving the spectrophotometric
calibration and redshift classification. DR13 releases new reductions of the
APOGEE1 data from SDSSIII, with abundances of elements not previously
included and improved stellar parameters for dwarf stars and cooler stars. For
the SDSS imaging data, DR13 provides new, more robust and precise photometric
calibrations. Several valueadded catalogs are being released in tandem with
DR13, in particular target catalogs relevant for eBOSS, TDSS, and SPIDERS, and
an updated redclump catalog for APOGEE. This paper describes the location and
format of the data now publicly available, as well as providing references to
the important technical papers that describe the targeting, observing, and data
reduction. The SDSS website, http://www.sdss.org, provides links to the data,
tutorials and examples of data access, and extensive documentation of the
reduction and analysis procedures. DR13 is the first of a scheduled set that
will contain new data and analyses from the planned ~6year operations of
SDSSIV.

We report and discuss the discovery of a comoving pair of bright solartype
stars, HD 240430 and HD 240429, with a significant difference in their chemical
abundances. The two stars have an estimated 3D separation of $\approx 0.6$ pc
($\approx 0.01$ pc projected) at a distance of $r\approx 100$ pc with nearly
identical threedimensional velocities, as inferred from Gaia TGAS parallaxes
and proper motions, and highprecision radial velocity measurements. Stellar
parameters determined from highresolution Keck HIRES spectra indicate that
both stars are $\sim 4$ Gyr old. The more metalrich of the two, HD 240430,
shows an enhancement of refractory ($T_C>1200$ K) elements by $\approx 0.2$ dex
and a marginal enhancement of (moderately) volatile elements ($T_C<1200$ K, C,
N, O, Na, and Mn). This is the largest metallicity difference found in a wide
binary pair yet. Additionally, HD 240430 shows an anomalously high surface
lithium abundance ($A(\mathrm{Li})=2.75$), higher than its companion by $0.5$
dex. The proximity in phasespace and ages between the two stars suggests that
they formed together with the same composition, at odds with the observed
differences in metallicity and abundance patterns. We therefore suggest that
the star HD~240430, "Kronos", accreted 15 $M_\oplus$ of rocky material after
birth, selectively enhancing the refractory elements as well as lithium in its
surface and convective envelope.

Distances to individual stars in our own Galaxy are critical in order to
piece together the nature of its velocity and spatial structure. Core helium
burning red clump (RC) stars have similar luminosities, are abundant throughout
the Galaxy, and thus constitute good standard candles. We build a hierarchical
probabilistic model to quantify the quality of RC stars as standard candles
using parallax measurements from the first Gaia data release. A unique aspect
of our methodology is to fully account for (and marginalize over) parallax,
photometry, and dust corrections uncertainties, which leads to more robust
results than standard approaches. We determine the absolute magnitude and
intrinsic dispersion of the RC in 2MASS bands J, H, Ks, Gaia G band, and WISE
bands W1, W2, W3, and W4. We find that the absolute magnitude of the RC is
$1.61 \pm$ 0.01 (in Ks), $+0.44 \pm$ 0.01 (in G) , $0.93 \pm$ 0.01 (in J),
$1.46 \pm$ 0.01 (in H), $1.68 \pm$ 0.02 (in W1), $1.69\pm$ 0.02 (in W2),
$1.67 \pm$ 0.02 (in W3), $1.76 \pm$ 0.01 mag (in W4). The mean intrinsic
dispersion is $\sim 0.17 \pm$ 0.03 mag across all bands (yielding a typical
distance precision of $\sim$ 8%). Thus RC stars are reliable and precise
standard candles. In addition, we have also recalibrated the zero point of the
absolute magnitude of the RC in each band, which provide a benchmark for future
studies to estimate distances to RC stars. Finally, the parallax error
shrinkage in the hierarchical model outlined in this work can be used to obtain
more precise parallaxes than Gaia for the most distant RC stars across the
Galaxy.

We describe the Sloan Digital Sky Survey IV (SDSSIV), a project encompassing
three major spectroscopic programs. The Apache Point Observatory Galactic
Evolution Experiment 2 (APOGEE2) is observing hundreds of thousands of Milky
Way stars at high resolution and high signaltonoise ratio in the
nearinfrared. The Mapping Nearby Galaxies at Apache Point Observatory (MaNGA)
survey is obtaining spatiallyresolved spectroscopy for thousands of nearby
galaxies (median redshift of z = 0.03). The extended Baryon Oscillation
Spectroscopic Survey (eBOSS) is mapping the galaxy, quasar, and neutral gas
distributions between redshifts z = 0.6 and 3.5 to constrain cosmology using
baryon acoustic oscillations, redshift space distortions, and the shape of the
power spectrum. Within eBOSS, we are conducting two major subprograms: the
SPectroscopic IDentification of eROSITA Sources (SPIDERS), investigating Xray
AGN and galaxies in Xray clusters, and the Time Domain Spectroscopic Survey
(TDSS), obtaining spectra of variable sources. All programs use the 2.5meter
Sloan Foundation Telescope at Apache Point Observatory; observations there
began in Summer 2014. APOGEE2 also operates a second nearinfrared
spectrograph at the 2.5meter du Pont Telescope at Las Campanas Observatory,
with observations beginning in early 2017. Observations at both facilities are
scheduled to continue through 2020. In keeping with previous SDSS policy,
SDSSIV provides regularly scheduled public data releases; the first one, Data
Release 13, was made available in July 2016.

Converting a noisy parallax measurement into a posterior belief over distance
requires inference with a prior. Usually this prior represents beliefs about
the stellar density distribution of the Milky Way. However, multiband
photometry exists for a large fraction of the \textsl{\small{Gaia}}
\textsl{\small{TGAS}} Catalog and is incredibly informative about stellar
distances. Here we use \textsl{\small{2MASS}} colors for 1.4 million
\textsl{\small{TGAS}} stars to build a noisedeconvolved empirical prior
distribution for stars in colormagnitude space. This model contains no
knowledge of stellar astrophysics or the Milky Way, but is precise because it
accurately generates a large number of noisy parallax measurements under an
assumption of stationarity; that is, it is capable of combining the information
from many stars. We use the Extreme Deconvolution (\textsl{\small{XD}})
algorithman Empirical Bayes approximation to a full hierarchical model of
the true parallax and photometry of every starto construct this prior. The
prior is combined with a \textsl{\small{TGAS}} likelihood to infer a precise
photometric parallax estimate and uncertainty (and full posterior) for every
star. Our parallax estimates are more precise than the \textsl{\small{TGAS}}
catalog entries by a median factor of 1.2 (14% are more precise by a factor >2)
and are more precise than previous Bayesian distance estimates that use spatial
priors. We validate our parallax inferences using members of the Milky Way star
cluster M67, which is not visible as a cluster in the \textsl{\small{TGAS}}
parallax estimates, but appears as a cluster in our posterior parallax
estimates. Our results, including a parallax posterior pdf for each of 1.4
million \textsl{\small{TGAS}} stars, are available in companion electronic
tables.

In this study, we probe the transition to cosmic homogeneity in the Large
Scale Structure (LSS) of the Universe using the CMASS galaxy sample of BOSS
spectroscopic survey which covers the largest effective volume to date, $3\
h^{3}\ \mathrm{Gpc}^3$ at $0.43 \leq z \leq 0.7$. We study the scaled
countsinspheres, $\mathcal{N}(<r)$, and the fractal correlation dimension,
$\mathcal{D}_2(r)$, to assess the homogeneity scale of the universe using a
$Landy\ \&\ Szalay$ inspired estimator.
Defining the scale of transition to homogeneity as the scale at which
$\mathcal{D}_2(r)$ reaches 3 within $1\%$, i.e. $\mathcal{D}_2(r)>2.97$ for
$r>\mathcal{R}_H$, we find $\mathcal{R}_H = (63.3\pm0.7) \ h^{1}\
\mathrm{Mpc}$, in agreement at the percentage level with the predictions of the
$\Lambda$CDM model $\mathcal{R}_H=62.0\ h^{1}\ \mathrm{Mpc}$. Thanks to the
large cosmic depth of the survey, we investigate the redshift evolution of the
transition to homogeneity scale and find agreement with the $\Lambda$CDM
prediction. Finally, we find that $\mathcal{D}_2$ is compatible with $3$ at
scales larger than $300\ h^{1}\ $Mpc in all redshift bins.
These results consolidate the Cosmological Principle and represent a precise
consistency test of the $\Lambda CDM$ model.

The primary sample of the {\it Gaia} Data Release 1 is the TychoGaia
Astrometric Solution (TGAS): $\approx$ 2 million Tycho2 sources with improved
parallaxes and proper motions relative to the initial catalog. This increased
astrometric precision presents an opportunity to find new binary stars and
moving groups. We search for highconfidence comoving pairs of stars in TGAS by
identifying pairs of stars consistent with having the same 3D velocity using a
marginalized likelihood ratio test to discriminate candidate comoving pairs
from the field population. Although we perform some visualizations using (bias
corrected) inverse parallax as a point estimate of distance, the likelihood
ratio is computed with a probabilistic model that includes the covariances of
parallax and proper motions and marginalizes the (unknown) true distances and
3D velocities of the stars. We find 13,085 comoving star pairs among 10,606
unique stars with separations as large as 10 pc (our search limit). Some of
these pairs form larger groups through mutual comoving neighbors: many of these
pair networks correspond to known open clusters and OB associations, but we
also report the discovery of several new comoving groups. Most surprisingly, we
find a large number of very wide ($>1$ pc) separation comoving star pairs, the
number of which increases with increasing separation and cannot be explained
purely by falsepositive contamination. Our key result is a catalog of
highconfidence comoving pairs of stars in TGAS. We discuss the utility of this
catalog for making dynamical inferences about the Galaxy, testing stellar
atmosphere models, and validating chemical abundance measurements.

Standard approaches to Bayesian parameter inference in large scale structure
assume a Gaussian functional form (chisquared form) for the likelihood. This
assumption, in detail, cannot be correct. Likelihood free inferences such as
Approximate Bayesian Computation (ABC) relax these restrictions and make
inference possible without making any assumptions on the likelihood. Instead
ABC relies on a forward generative model of the data and a metric for measuring
the distance between the model and data. In this work, we demonstrate that ABC
is feasible for LSS parameter inference by using it to constrain parameters of
the halo occupation distribution (HOD) model for populating dark matter halos
with galaxies.
Using specific implementation of ABC supplemented with Population Monte Carlo
importance sampling, a generative forward model using HOD, and a distance
metric based on galaxy number density, twopoint correlation function, and
galaxy group multiplicity function, we constrain the HOD parameters of mock
observation generated from selected "true" HOD parameters. The parameter
constraints we obtain from ABC are consistent with the "true" HOD parameters,
demonstrating that ABC can be reliably used for parameter inference in LSS.
Furthermore, we compare our ABC constraints to constraints we obtain using a
pseudolikelihood function of Gaussian form with MCMC and find consistent HOD
parameter constraints. Ultimately our results suggest that ABC can and should
be applied in parameter inference for LSS analyses.

Given sparse or lowquality radialvelocity measurements of a star, there are
often many qualitatively different stellar or exoplanet companion orbit models
that are consistent with the data. The consequent multimodality of the
likelihood function leads to extremely challenging search, optimization, and
MCMC posterior sampling over the orbital parameters. Here we create a custom
Monte Carlo sampler for sparse or noisy radialvelocity measurements of
twobody systems that can produce posterior samples for orbital parameters even
when the likelihood function is poorly behaved. The six standard orbital
parameters for a binary system can be split into four nonlinear parameters
(period, eccentricity, argument of pericenter, phase) and two linear parameters
(velocity amplitude, barycenter velocity). We capitalize on this by building a
sampling method in which we densely sample the prior pdf in the nonlinear
parameters and perform rejection sampling using a likelihood function
marginalized over the linear parameters. With sparse or uninformative data, the
sampling obtained by this rejection sampling is generally multimodal and dense.
With informative data, the sampling becomes effectively unimodal but too
sparse: in these cases we follow the rejection sampling with standard MCMC. The
method produces correct samplings in orbital parameters for data that include
as few as three epochs. The Joker can therefore be used to produce proper
samplings of multimodal pdfs, which are still informative and can be used in
hierarchical (population) modeling. We give some examples that show how the
posterior pdf depends sensitively on the number and time coverage of the
observations and their uncertainties.

We present a hierarchical probabilistic model for improving geometric stellar
distance estimates using colormagnitude information. This is achieved with a
data driven model of the colormagnitude diagram, not relying on stellar
models but instead on the relative abundances of stars in colormagnitude
cells, which are inferred from very noisy magnitudes and parallaxes. While the
resulting noisedeconvolved colormagnitude diagram can be useful for a range
of applications, we focus on deriving improved stellar distance estimates
relying on both parallax and photometric information. We demonstrate the
efficiency of this approach on the 1.4 million stars of the Gaia TGAS sample
that also have APASS magnitudes. Our hierarchical model has 4~million
parameters in total, most of which are marginalized out numerically or
analytically. We find that distance estimates are significantly improved for
the noisiest parallaxes and densest regions of the colormagnitude diagram. In
particular, the average distance signaltonoise ratio and uncertainty improve
by 19~percent and 36~percent, respectively, with 8~percent of the objects
improving in SNR by a factor greater than 2. This computationally efficient
approach fully accounts for both parallax and photometric noise, and is a first
step towards a full hierarchical probabilistic model of the Gaia data.

We explore to which extent stars within Galactic disk open clusters resemble
each other in the highdimensional space of their photospheric element
abundances, and contrast this with pairs of field stars. Our analysis is based
on abundances for 20 elements, homogeneously derived from APOGEE spectra (with
carefully quantified uncertainties, with a median value of $\sim 0.03$ dex). We
consider 90 red giant stars in seven open clusters and find that most stars
within a cluster have abundances in most elements that are indistinguishable
(in a $\chi^2$sense) from those of the other members, as expected for stellar
birth siblings. An analogous analysis among pairs of $>1000$ field stars shows
that highly significant abundance differences in the 20dimensional space can
be established for the vast majority of these pairs, and that the APOGEEbased
abundance measurements have high discriminating power. However, pairs of field
stars whose abundances are indistinguishable even at 0.03~dex precision exist:
$\sim 0.3$ percent of all field star pairs, and $\sim 1.0$ percent of field
star pairs at the same (solar) metallicity [Fe/H] = $0 \pm 0.02$. Most of these
pairs are presumably not birth siblings from the same cluster, but rather
doppelganger. Our analysis implies that 'chemical tagging' in the strict sense,
identifying birth siblings for typical disk stars through their abundance
similarity alone, will not work with such data. However, our approach shows
that abundances have extremely valuable information for probabilistic
chemoorbital modeling and combined with velocities, we have identified new
cluster members from the field.

In this era of largescale stellar spectroscopic surveys, measurements of
stellar attributes ("labels," i.e. parameters and abundances) must be made
precise and consistent across surveys. Here, we demonstrate that this can be
achieved by a datadriven approach to spectral modeling. With The Cannon, we
transfer information from the APOGEE survey to determine precise Teff, log g,
[Fe/H], and [$\alpha$/M] from the spectra of 450,000 LAMOST giants. The Cannon
fits a predictive model for LAMOST spectra using 9952 stars observed in common
between the two surveys, taking five labels from APOGEE DR12 as ground truth:
Teff, log g, [Fe/H], [\alpha/M], and Kband extinction $A_k$. The model is then
used to infer Teff, log g, [Fe/H], and [$\alpha$/M] for 454,180 giants, 20% of
the LAMOST DR2 stellar sample. These are the first [$\alpha$/M] values for the
full set of LAMOST giants, and the largest catalog of [$\alpha$/M] for giant
stars to date. Furthermore, these labels are by construction on the APOGEE
label scale; for spectra with S/N > 50, crossvalidation of the model yields
typical uncertainties of 70K in Teff, 0.1 in log g, 0.1 in [Fe/H], and 0.04 in
[$\alpha$/M], values comparable to the broadly stated, conservative APOGEE DR12
uncertainties. Thus, by using "label transfer" to tie lowresolution (LAMOST R
$\sim$ 1800) spectra to the label scale of a much higherresolution (APOGEE R
$\sim$ 22,500) survey, we substantially reduce the inconsistencies between
labels measured by the individual survey pipelines. This demonstrates that
label transfer with The Cannon can successfully bring different surveys onto
the same physical scale.

We present a new method for inferring photometric redshifts in deep galaxy
and quasar surveys, based on a data driven model of latent spectral energy
distributions (SEDs) and a physical model of photometric fluxes as a function
of redshift. This conceptually novel approach combines the advantages of both
machinelearning and templatefitting methods by building template SEDs
directly from the training data. This is made computationally tractable with
Gaussian Processes operating in fluxredshift space, encoding the physics of
redshift and the projection of galaxy SEDs onto photometric band passes. This
method alleviates the need of acquiring representative training data or
constructing detailed galaxy SED models; it requires only that the photometric
band passes and calibrations be known or have parameterized unknowns. The
training data can consist of a combination of spectroscopic and deep manyband
photometric data, which do not need to entirely spatially overlap with the
target survey of interest or even involve the same photometric bands. We
showcase the method on the $i$magnitudeselected, spectroscopicallyconfirmed
galaxies in the COSMOS field. The model is trained on the deepest bands (from
SUBARU and HST) and photometric redshifts are derived using the shallower SDSS
optical bands only. We demonstrate that we obtain accurate redshift point
estimates and probability distributions despite the training and target sets
having very different redshift distributions, noise properties, and even
photometric bands. Our model can also be used to predict missing photometric
fluxes, or to simulate populations of galaxies with realistic fluxes and
redshifts, for example. This method opens a new era in which photometric
redshifts for large photometric surveys are derived using a flexible yet
physical model of the data trained on all available surveys (spectroscopic and
photometric).

The BOSS quasar sample is used to study cosmic homogeneity with a 3D survey
in the redshift range $2.2<z<2.8$. We measure the countinsphere, $N(<\! r)$,
i.e. the average number of objects around a given object, and its logarithmic
derivative, the fractal correlation dimension, $D_2(r)$. For a homogeneous
distribution $N(<\! r) \propto r^3$ and $D_2(r)=3$. Due to the uncertainty on
tracer density evolution, 3D surveys can only probe homogeneity up to a
redshift dependence, i.e. they probe socalled "spatial isotropy". Our data
demonstrate spatial isotropy of the quasar distribution in the redshift range
$2.2<z<2.8$ in a modelindependent way, independent of any FLRW fiducial
cosmology, resulting in $3\langle D_2 \rangle < 1.7 \times 10^{3}$ (2
$\sigma$) over the range $250<r<1200 \, h^{1}$Mpc for the quasar distribution.
If we assume that quasars do not have a bias much less than unity, this implies
spatial isotropy of the matter distribution on large scales. Then, combining
with the Copernican principle, we finally get homogeneity of the matter
distribution on large scales. Alternatively, using a flat $\Lambda$CDM fiducial
cosmology with CMBderived parameters, and measuring the quasar bias relative
to this $\Lambda$CDM model, our data provide a consistency check of the model,
in terms of how homogeneous the Universe is on different scales. $D_2(r)$ is
found to be compatible with our $\Lambda$CDM model on the whole $10<r<1200 \,
h^{1}$Mpc range. For the matter distribution we obtain $3\langle D_2 \rangle
< 5 \times 10^{5}$ (2 $\sigma$) over the range $250<r<1200 \, h^{1}$Mpc,
consistent with homogeneity on large scales.

One of the most demanding tasks in astronomical image processingin terms
of precisionis the centroiding of stars. Upcoming large surveys are going to
take images of billions of point sources, including many faint stars, with
short exposure times. Realtime estimation of the centroids of stars is crucial
for realtime PSF estimation, and maximal precision is required for
measurements of proper motion.
The fundamental Cram\'{e}rRao lower bound sets a limit on the
rootmeansquarederror achievable by optimal estimators. In this work, we aim
to compare the performance of various centroiding methods, in terms of
saturating the bound, when they are applied to relatively low signaltonoise
ratio unsaturated stars assuming zeromean constant Gaussian noise. In order to
make this comparison, we present the ratio of the rootmeansquarederrors of
these estimators to their corresponding Cram\'{e}rRao bound as a function of
the signaltonoise ratio and the fullwidth at halfmaximum of faint stars.
We discuss two general circumstances in centroiding of faint stars: (i) when
we have a good estimate of the PSF, (ii) when we do not know the PSF. In the
case that we know the PSF, we show that a fast polynomial centroiding after
smoothing the image by the PSF can be as efficient as the maximumlikelihood
estimator at saturating the bound. In the case that we do not know the PSF, we
demonstrate that although polynomial centroiding is not as optimal as PSF
profile fitting, it comes very close to saturating the Cram\'{e}rRao lower
bound in a wide range of conditions. We also show that the momentbased method
of centeroflight never comes close to saturating the bound, and thus it does
not deliver reliable estimates of centroids.

The Kepler Mission has discovered thousands of exoplanets and revolutionized
our understanding of their population. This large, homogeneous catalog of
discoveries has enabled rigorous studies of the occurrence rate of exoplanets
and planetary systems as a function of their physical properties. However,
transit surveys like Kepler are most sensitive to planets with orbital periods
much shorter than the orbital periods of Jupiter and Saturn, the most massive
planets in our Solar System. To address this deficiency, we perform a fully
automated search for longperiod exoplanets with only one or two transits in
the archival Kepler light curves. When applied to the $\sim 40,000$ brightest
Sunlike target stars, this search produces 16 longperiod exoplanet
candidates. Of these candidates, 6 are novel discoveries and 5 are in systems
with inner shortperiod transiting planets. Since our method involves no human
intervention, we empirically characterize the detection efficiency of our
search. Based on these results, we measure the average occurrence rate of
exoplanets smaller than Jupiter with orbital periods in the range 225 years to
be $2.0\pm0.7$ planets per Sunlike star.