
Due to the difficulty of repairing defect, many research efforts have been
devoted into automatic defect repair. Given a buggy program that fails some
test cases, a typical automatic repair technique tries to modify the program to
make all tests pass. However, since the test suites in real world projects are
usually insufficient, aiming at passing the test suites often leads to
incorrect patches.
In this paper we aim to produce precise patches, that is, any patch we
produce has a relatively high probability to be correct. More concretely, we
focus on condition synthesis, which was shown to be able to repair more than
half of the defects in existing approaches. Our key insight is threefold.
First, it is important to know what variables in a local context should be used
in an "if" condition, and we propose a sorting method based on the dependency
relations between variables. Second, we observe that the API document can be
used to guide the repair process, and propose document analysis technique to
further filter the variables. Third, it is important to know what predicates
should be performed on the set of variables, and we propose to mine a set of
frequently used predicates in similar contexts from existing projects.
We develop a novel program repair system, ACS, that could generate precise
conditions at faulty locations. Furthermore, given the generated conditions are
very precise, we can perform a repair operation that is previously deemed to be
too overfitting: directly returning the test oracle to repair the defect. Using
our approach, we successfully repaired 17 defects on four projects of
Defects4J, which is the largest number of fully automatically repaired defects
reported on the dataset so far. More importantly, the precision of our approach
in the evaluation is 73.9%, which is significantly higher than previous
approaches, which are usually less than 40%.

The most common statistic used to analyze largescale structure surveys is
the correlation function, or power spectrum. Here, we show how `slicing' the
correlation function on local density brings sensitivity to interesting
nonGaussian features in the largescale structure, such as the expansion or
contraction of baryon acoustic oscillations (BAO) according to the local
density. The sliced correlation function measures the largescale flows that
smear out the BAO, instead of just correcting them as reconstruction algorithms
do. Thus, we expect the sliced correlation function to be useful in
constraining the growth factor, and modified gravity theories that involve the
local density. Out of the studied cases, we find that the run of the BAO peak
location with density is best revealed when slicing on a $\sim 40$ Mpc/$h$
filtered density. But slicing on a $\sim100$ Mpc/$h$ filtered density may be
most useful in distinguishing between underdense and overdense regions, whose
BAO peaks are separated by a substantial $\sim 5$ Mpc/$h$ at $z=0$. We also
introduce `curtain plots' showing how local densities drive particle motions
toward or away from each other over the course of an $N$body simulation.

Schwinn et al. (2017) have recently compared the abundance and distribution
of massive substructures identified in a gravitational lensing analysis of
Abell 2744 by Jauzac et al. (2016) and Nbody simulation and found no cluster
in {\Lambda}CDM simulation that is similar to Abell 2744. Schwinn et al.(2017)
identified the measured projected aperture masses with the actual masses
associated with subhaloes in the MXXL Nbody simulation. We have used the high
resolution Phoenix cluster simulations to show that such an identification is
incorrect: the aperture mass is dominated by mass in the body of the cluster
that happens to be projected along the lineofsight to the subhalo. This
enhancement varies from factors of a few to factors of more than 100,
particularly for subhaloes projected near the centre of the cluster. We
calculate aperture masses for subhaloes in our simulation and compare them to
the measurements for Abell 2744. We find that the data for Abell 2744 are in
excellent agreement with the matched predictions from {\Lambda}CDM. We provide
further predictions for aperture mass functions of subhaloes in idealized
surveys with varying mass detection thresholds.

Circuit polynomials are polynomials supported on circuits. The nonnegativity
of circuit polynomials is easy to check. Representing polynomials as sums of
nonnegative circuit polynomials is a certificate of the nonnegativity of
polynomials. For a polynomial with a simplex Newton polytope satisfying certain
conditions, it is nonnegative if and only if it is a sum of nonnegative circuit
polynomials. In this paper, we generalize this conclusion to polynomials with
general Newton polytopes. Moreover, we put the problem to decide if a
polynomial can be written as a sum of nonnegative circuit polynomials down to
the feasibility of a relative entropy program. Since relative entropy programs
are convex, they can be checked very efficiently.

We use a volumelimited galaxy sample from the SDSS Data Release 7 to explore
the dependence of galactic conformity on the largescale environment, measured
on $\sim$ 4 Mpc scales. We find that the star formation activity of neighbour
galaxies depends more strongly on the environment than on the activity of their
primary galaxies. In underdense regions most neighbour galaxies tend to be
active, while in overdense regions neighbour galaxies are mostly passive,
regardless of the activity of their primary galaxies. At a given stellar mass,
passive primary galaxies reside in higher density regions than active primary
galaxies, leading to the apparently strong conformity signal. The dependence of
the activity of neighbour galaxies on environment can be explained by the
corresponding dependence of the fraction of satellite galaxies. Similar results
are found for galaxies in a semianalytical model, suggesting that no new
physics is required to explain the observed largescale conformity.

The primal cohomology $\mathbb{K}_\mathbb{Q}$ of the theta divisor $\Theta$
of a principally polarized abelian fivefold (ppav) is the direct sum of its
invariant and antiinvariant parts $\mathbb{K}_\mathbb{Q}^{+1}$, resp.
$\mathbb{K}_\mathbb{Q}^{1}$ under the action of $1$. For smooth $\Theta$,
these have dimension $6$ and $72$ respectively. We show that
$\mathbb{K}_\mathbb{Q}^{+1}$ consists of Hodge classes and, for a very general
ppav, $\mathbb{K}_\mathbb{Q}^{1}$ is a simple Hodge structure of level $2$.

We study the radial acceleration relation (RAR) for earlytype galaxies
(ETGs) in the SDSS MaNGA MPL5 dataset. The complete ETG sample show a slightly
offset RAR from the relation reported by McGaugh et al. (2016) at the
lowacceleration end; we find that the deviation is due to the fact that the
slow rotators show a systematically higher acceleration relation than the
McGaugh's RAR, while the fast rotators show a consistent acceleration relation
to McGaugh's RAR. There is a 1\sigma significant difference between the
acceleration relations of the fast and slow rotators, suggesting that the
acceleration relation correlates with the galactic spins, and that the slow
rotators may have a different mass distribution compared with fast rotators and
latetype galaxies. We suspect that the acceleration relation deviation of slow
rotators may be attributed to more galaxy merger events, which would disrupt
the original spins and correlated distributions of baryons and dark matter
orbits in galaxies.

We introduce a new code for cosmological simulations, PHoToNs, which has
features on performing massive cosmological simulations on heterogeneous high
performance Computer (HPC) and threads oriented programming. PHoToNs adopts a
hybrid scheme to compute gravity force, with the conventional PM to compute the
longrange force, the Tree algorithm to compute the short range force, and the
direct summation PP to compute the gravity from very close particles. A
selfsimilar space filling PeanoHilbert curve is used to decompose computing
domain. Threads programming is highly used to more flexibly manage the domain
communication, PM calculation and synchronization, as well as Dual Tree
Traversal on the CPU+MIC platform. The scalability of the PHoToNs performs well
and the efficiency of PP kernel achieves 68.6% of peak performance on MIC and
74.4% on CPU platforms. We also test the accuracy of the code against the much
used Gadget2 in the community and found excellent agreement.

A shape filter is presented to repair segmentation results obtained in
calcium imaging of neurons in vivo. This postsegmentation algorithm can
automatically smooth the preliminary segmentations, while excluding the
incomplete segmentations where two neurons are counted as one combined
component. The shape filter is realized using a squareroot velocity to project
the shapes on a shape manifold where distances between shapes are based on
elastic changes. Two datadriven weighting methods are proposed to achieve a
tradeoff between shape smoothness and consistency with the data. Intuitive
comparisons of proposed methods demonstrate the effectiveness of shape filter
by projecting the shape evolution path on Riemannian manifold to Cartesian
maps. Quantitative measures also prove the superiority of our methods over
models that do not employ any weighting criterion.

The abundance of neutral hydrogen (HI) in satellite galaxies in the Local
Group is important for studying the formation history of our Local Group. In
this work, we generated mock HI satellite galaxies in the Local Group using the
high mass resolution hydrodynamic \textsc{apostle} simulation. The simulated HI
mass function agrees with the ALFALFA survey very well above $10^6M_{\odot}$,
although there is a discrepancy below this scale because of the observed flux
limit. After carefully checking various systematic elements in the
observations, including fitting of line width, sky coverage, integration time,
and frequency drift due to uncertainty in a galaxy's distance, we predicted the
abundance of HI in galaxies in a future survey that will be conducted by FAST.
FAST has a larger aperture and higher sensitivity than the Arecibo telescope.
We found that the HI mass function could be estimated well around $10^5
M_{\odot}$ if the integration time is 40 minutes. Our results indicate that
there are 61 HI satellites in the Local Group, and 36 in the FAST field above
$10^5 M_{\odot}$. This estimation is one order of magnitude better than the
current data, and will put a strong constraint on the formation history of the
Local Group. Also more high resolution simulated samples are needed to achieve
this target.

Our task is to generate an effective summary for a given document with
specific realtime requirements. We use the softplus function to enhance keyword
rankings to favor important sentences, based on which we present a number of
summarization algorithms using various keyword extraction and topic clustering
methods. We show that our algorithms meet the realtime requirements and yield
the best ROUGE recall scores on DUC02 over all previouslyknown algorithms. We
show that our algorithms meet the realtime requirements and yield the best
ROUGE recall scores on DUC02 over all previouslyknown algorithms. To evaluate
the quality of summaries without humangenerated benchmarks, we define a
measure called WESM based on wordembedding using Word Mover's Distance. We
show that the orderings of the ROUGE and WESM scores of our algorithms are
highly comparable, suggesting that WESM may serve as a viable alternative for
measuring the quality of a summary.

We study automatic title generation for a given block of text and present a
method called DTATG to generate titles. DTATG first extracts a small number of
central sentences that convey the main meanings of the text and are in a
suitable structure for conversion into a title. DTATG then constructs a
dependency tree for each of these sentences and removes certain branches using
a Dependency Tree Compression Model we devise. We also devise a title test to
determine if a sentence can be used as a title. If a trimmed sentence passes
the title test, then it becomes a title candidate. DTATG selects the title
candidate with the highest ranking score as the final title. Our experiments
showed that DTATG can generate adequate titles. We also showed that
DTATGgenerated titles have higher F1 scores than those generated by the
previous methods.

We derive tight and computable bounds on the bias of statistical estimators,
or more generally of quantities of interest, when evaluated on a baseline model
P rather than on the typically unknown true model Q. Our proposed method
combines the scalable information inequality derived by P. Dupuis, K.Chowdhary,
the authors and their collaborators together with classical concentration
inequalities (such as Bennett's and HoeffdingAzuma inequalities). Our bounds
are expressed in terms of the KullbackLeibler divergence R(QP) of model Q
with respect to P and the moment generating function for the statistical
estimator under P. Furthermore, concentration inequalities, i.e. bounds on
moment generating functions, provide tight and computationally inexpensive
model bias bounds for quantities of interest. Finally, they allow us to derive
rigorous confidence bands for statistical estimators that account for model
bias and are valid for an arbitrary amount of data.

Sparse support vector machine (SVM) is a popular classification technique
that can simultaneously learn a small set of the most interpretable features
and identify the support vectors. It has achieved great successes in many
realworld applications. However, for largescale problems involving a huge
number of samples and ultrahigh dimensional features, solving sparse SVMs
remains challenging. By noting that sparse SVMs induce sparsities in both
feature and sample spaces, we propose a novel approach, which is based on
accurate estimations of the primal and dual optima of sparse SVMs, to
simultaneously identify the inactive features and samples that are guaranteed
to be irrelevant to the outputs. Thus, we can remove the identified inactive
samples and features from the training phase, leading to substantial savings in
the computational cost without sacrificing the accuracy. Moreover, we show that
our method can be extended to multiclass sparse support vector machines. To
the best of our knowledge, the proposed method is the \emph{first}
\emph{static} feature and sample reduction method for sparse SVMs and
multiclass sparse SVMs. Experiments on both synthetic and real data sets
demonstrate that our approach significantly outperforms stateoftheart
methods and the speedup gained by our approach can be orders of magnitude.

We take advantage of the statistical power of the largevolume
darkmatteronly Millennium simulation, combined with a sophisticated
semianalytic galaxy formation model, to explore whether the recently reported
$z=3.7$ quiescent galaxy ZFCOSMOS20115 (ZF; Glazebrook et al. 2017) can be
accommodated in current galaxy formation models. In our model, a population of
quiescent galaxies (QGs) with stellar masses and star formation rates
comparable to those of ZF naturally emerges at redshifts $z<4$. There are two
and five ZF analogues at the redshift $3.86$ and $3.58$ in the Millennium
simulation volume, respectively. We demonstrate that, while the $z>3.5$ massive
QGs are rare (about 2\% of the galaxies with the similar stellar masses), the
existing AGN feedback model implemented in the semianalytic galaxy formation
model can successfully explain the formation of the highredshift QGs as it
does on their lower redshift counterparts.

The standard galaxy formation theory assumes that baryons and dark matter are
initially wellmixed before becoming segregated due to radiative cooling. We
use nonradiative hydrodynamical simulations to explicitly examine this
assumption and find that baryons and dark matter can also be segregated because
of different physics obeyed by gas and dark matter during the buildup of the
halo. As a result, baryons in many haloes do not originate from the same
Lagrangian region as the dark matter. When using the fraction of corresponding
dark matter and gas particles in the initial conditions (the "paired fraction")
as a proxy of the dark matter and gas segregation strength of a halo, on
average about $25$ percent of the baryonic and dark matter of the final halo
are segregated in the initial conditions. This is at odds with the assumption
of the standard galaxy formation model. A consequence of this effect is that
the baryons and dark matter of the same halo initially experience different
tidal torques and thus their angular momentum vectors are often misaligned. The
degree of the misalignment is largely preserved during later halo assembly and
can be understood with the tidal torque theory. The result challenges the
precision of some semianalytical approaches which utilize dark matter halo
merger trees to infer properties of gas associated to dark matter haloes.

Genomewide association studies (GWAS) have achieved great success in the
genetic study of Alzheimer's disease (AD). Collaborative imaging genetics
studies across different research institutions show the effectiveness of
detecting genetic risk factors. However, the high dimensionality of GWAS data
poses significant challenges in detecting risk SNPs for AD. Selecting relevant
features is crucial in predicting the response variable. In this study, we
propose a novel Distributed Feature Selection Framework (DFSF) to conduct the
largescale imaging genetics studies across multiple institutions. To speed up
the learning process, we propose a family of distributed group Lasso screening
rules to identify irrelevant features and remove them from the optimization.
Then we select the relevant group features by performing the group Lasso
feature selection process in a sequence of parameters. Finally, we employ the
stability selection to rank the top risk SNPs that might help detect the early
stage of AD. To the best of our knowledge, this is the first distributed
feature selection model integrated with group Lasso feature selection as well
as detecting the risk genetic factors across multiple research institutions
system. Empirical studies are conducted on 809 subjects with 5.9 million SNPs
which are distributed across several individual institutions, demonstrating the
efficiency and effectiveness of the proposed method.

Version information plays an important role in spreadsheet understanding,
maintaining and quality improving. However, end users rarely use version
control tools to document spreadsheet version information. Thus, the
spreadsheet version information is missing, and different versions of a
spreadsheet coexist as individual and similar spreadsheets. Existing approaches
try to recover spreadsheet version information through clustering these similar
spreadsheets based on spreadsheet filenames or related email conversation.
However, the applicability and accuracy of existing clustering approaches are
limited due to the necessary information (e.g., filenames and email
conversation) is usually missing. We inspected the versioned spreadsheets in
VEnron, which is extracted from the Enron Corporation. In VEnron, the different
versions of a spreadsheet are clustered into an evolution group. We observed
that the versioned spreadsheets in each evolution group exhibit certain common
features (e.g., similar table headers and worksheet names). Based on this
observation, we proposed an automatic clustering algorithm, SpreadCluster.
SpreadCluster learns the criteria of features from the versioned spreadsheets
in VEnron, and then automatically clusters spreadsheets with the similar
features into the same evolution group. We applied SpreadCluster on all
spreadsheets in the Enron corpus. The evaluation result shows that
SpreadCluster could cluster spreadsheets with higher precision and recall rate
than the filenamebased approach used by VEnron. Based on the clustering result
by SpreadCluster, we further created a new versioned spreadsheet corpus
VEnron2, which is much bigger than VEnron. We also applied SpreadCluster on the
other two spreadsheet corpora FUSE and EUSES. The results show that
SpreadCluster can cluster the versioned spreadsheets in these two corpora with
high precision.

Offering highbrilliance Xray beams on micrometer length scales, the
microfocusSAXS at SSRF BL16B1 was established with a KB mirror system for
studying small sample volumes, or probing microscopic morphologies. The SAXS
minimum q value was 0.1nm1 with a flux of 1.5 * 10^10 photons/s. Two
positionresolved scanning experimental methods were combined with
microfocusSAXS that include STXM and CT. To improve the significant smearing
effect in the horizontal direction, an effective and easytouse desmearing
procedure for twodimensional SAXS pattern based on the blind deconvolution was
developed and the deblurring results demonstrated the good restoration effect
for the defocus image. Finally, a bamboo sample was selected for SAXSCT
experiment which illustrated the performance of the microfocusSAXS method.

The objective of this paper is to evaluate the potential of Gaofen2 (GF2)
high resolution multispectral sensor (MS) and panchromatic (PAN) imagery on
water mapping. Difficulties of water mapping on high resolution data includes:
1) misclassification between water and shadows or other lowreflectance ground
objects, which is mostly caused by the spectral similarity within the given
band range; 2) small water bodies with size smaller than the spatial resolution
of MS image. To solve the confusion between water and lowreflectance objects,
the Landsat 8 time series with two shortwave infrared (SWIR) bands is added
because water has extremely strong absorption in SWIR. In order to integrate
the three multisensor, multiresolution data sets, the probabilistic graphical
model (PGM) is utilized here with conditional probability distribution defined
mainly based on the size of each object. For comparison, results from the SVM
classifier on the PCA fused and MS data, thresholding method on the PAN image,
and water index method on the Landsat data are computed. The confusion matrices
are calculated for all the methods. The results demonstrate that the PGM method
can achieve the best performance with the highest overall accuracy. Moreover,
small rivers can also be extracted by adding weight on the PAN result in PGM.
Finally, the postclassification procedure is applied on the PGM result to
further exclude misclassification in shadow and waterland boundary regions.
Accordingly, the producer's, user's and overall accuracy are all increased,
indicating the effectiveness of our method.

There is a trend to acquire high accuracy landcover maps using multisource
classification methods, most of which are based on data fusion, especially
pixel or featurelevel fusions. A probabilistic graphical model (PGM) approach
is proposed in this research for 30 m resolution landcover mapping with
multitemporal Landsat and MODerate Resolution Imaging Spectroradiometer
(MODIS) data. Independent classifiers were applied to two singledate Landsat 8
scenes and the MODIS timeseries data, respectively, for probability
estimation. A PGM was created for each pixel in Landsat 8 data. Conditional
probability distributions were computed based on data quality and reliability
by using information selectively. Using the administrative territory of Beijing
City (Area1) and a coastal region of Shandong province, China (Area2) as
study areas, multiple landcover maps were generated for comparison.
Quantitative results show the effectiveness of the proposed method. Overall
accuracies promoted from 74.0% (maps acquired from singletemporal Landsat
images) to 81.8% (output of the PGM) for Area1. Improvements can also be seen
when using MODIS data and only a singletemporal Landsat image as input
(overall accuracy: 78.4% versus 74.0% for Area1, and 86.8% versus 83.0% for
Area2). Information from MODIS data did not help much when the PGM was applied
to cloud free regions of. One of the advantages of the proposed method is that
it can be applied where multitemporal data cannot be simply stacked as a
multilayered image.

A graph $\Gamma$ is called $(G, s)$arctransitive if $G \le {\rm
Aut}(\Gamma)$ is transitive on $V\Gamma$ and transitive on the set of $s$arcs
of $\Gamma$, where for an integer $s \ge 1$ an $s$arc of $\Gamma$ is a
sequence of $s+1$ vertices $(v_0,v_1,\ldots,v_s)$ of $\Gamma$ such that
$v_{i1}$ and $v_i$ are adjacent for $1 \le i \le s$ and $v_{i1}\ne v_{i+1}$
for $1 \le i \le s1$. $\Gamma$ is called 2transitive if it is $({\rm
Aut}(\Gamma), 2)$arctransitive but not $({\rm Aut}(\Gamma),
3)$arctransitive. A Cayley graph $\Gamma$ of a group $G$ is called normal if
$G$ is normal in ${\rm Aut}(\Gamma)$ and nonnormal otherwise. It was proved by
X. G. Fang, C. H. Li and M. Y. Xu that if $\Gamma$ is a tetravalent
2transitive Cayley graph of a finite simple group $G$, then either $\Gamma$ is
normal or $G$ is one of the groups ${\rm PSL}_2(11)$, $M_{11}$, $M_{23}$ and
$A_{11}$. In the present paper we prove further that among these four groups
only $M_{11}$ produces connected tetravalent 2transitive nonnormal Cayley
graphs, and there are exactly two such graphs which are nonisomorphic and both
determined in the paper. As a consequence, the automorphism group of any
connected tetravalent 2transitive Cayley graph of any finite simple group is
determined.

In this paper, we prove a finite basis theorem for radical wellmixed
difference ideals generated by binomials. As a consequence, every strictly
ascending chain of radical wellmixed difference ideals generated by binomials
in a difference polynomial ring is finite, which answers a question raised by
E. Hrushovski in the binomial case.

In this paper, we introduce the concept of Pdifference varieties and study
the properties of toric Pdifference varieties. Toric Pdifference varieties
are analogues of toric varieties in difference algebra geometry. The category
of affine toric Pdifference varieties with toric morphisms is shown to be
antiequivalent to the category of affine P[x]semimodules with P[x]semimodule
morphisms. Moreover, there is a onetoone correspondence between the
irreducible invariant Pdifference subvarieties of an affine toric Pdifference
variety and the faces of the corresponding affine P[x]semimodule. We also
define abstract toric Pdifference varieties associated with fans by gluing
affine toric Pdifference varieties. The irreducible invariant Pdifference
subvarietiesfaces correspondence is generalized to abstract toric Pdifference
varieties. By virtue of this correspondence, a divisor theory for abstract
toric Pdifference varieties is developed.

Genomewide association studies (GWAS) offer new opportunities to identify
genetic risk factors for Alzheimer's disease (AD). Recently, collaborative
efforts across different institutions emerged that enhance the power of many
existing techniques on individual institution data. However, a major barrier to
collaborative studies of GWAS is that many institutions need to preserve
individual data privacy. To address this challenge, we propose a novel
distributed framework, termed Local Query Model (LQM) to detect risk SNPs for
AD across multiple research institutions. To accelerate the learning process,
we propose a Distributed Enhanced Dual Polytope Projection (DEDPP) screening
rule to identify irrelevant features and remove them from the optimization. To
the best of our knowledge, this is the first successful run of the
computationally intensive model selection procedure to learn a consistent model
across different institutions without compromising their privacy while ranking
the SNPs that may collectively affect AD. Empirical studies are conducted on
809 subjects with 5.9 million SNP features which are distributed across three
individual institutions. DEDPP achieved a 66fold speedup by effectively
identifying irrelevant features.