
Machine learning (ML) techniques are increasingly common in security
applications, such as malware and intrusion detection. However, ML models are
often susceptible to evasion attacks, in which an adversary makes changes to
the input (such as malware) in order to avoid being detected. A conventional
approach to evaluate ML robustness to such attacks, as well as to design robust
ML, is by considering simplified featurespace models of attacks, where the
attacker changes ML features directly to effect evasion, while minimizing or
constraining the magnitude of this change. We investigate the effectiveness of
this approach to designing robust ML in the face of attacks that can be
realized in actual malware (realizable attacks). We demonstrate that in the
context of structurebased PDF malware detection, such techniques appear to
have limited effectiveness, but they are effective with contentbased
detectors. In either case, we show that augmenting the feature space models
with conserved features (those that cannot be unilaterally modified without
compromising malicious functionality) significantly improves performance.
Finally, we show that feature space models enable generalized robustness when
faced with a variety of realizable attacks, as compared to classifiers which
are tuned to be robust to a specific realizable attack.

For spectrally negative L\'evy processes, adapting an approach from
\cite{BoLi:sub1} we identify joint
Laplace transforms involving local times evaluated at either the first
passage times, or independent exponential times, or inverse local times. The
Laplace transforms are expressed in terms of the associated scale functions.
Connections are made with the permanental process and the Markovian loop soup
measure.

We establish a correspondence on a Riemann surface between hyperbolic metrics
with isolated singularities and bounded projective functions whose Schwarzian
derivatives have at most double poles and whose monodromies lie in ${\rm
PSU}(1,\,1)$. As an application, we construct explicitly a new class of
hyperbolic metrics with countably many singularities on the unit disc.

Most of previous machine learning algorithms are proposed based on the i.i.d.
hypothesis. However, this ideal assumption is often violated in real
applications, where selection bias may arise between training and testing
process. Moreover, in many scenarios, the testing data is not even available
during the training process, which makes the traditional methods like transfer
learning infeasible due to their need on prior of test distribution. Therefore,
how to address the agnostic selection bias for robust model learning is of
paramount importance for both academic research and real applications. In this
paper, under the assumption that causal relationships among variables are
robust across domains, we incorporate causal technique into predictive modeling
and propose a novel Causally Regularized Logistic Regression (CRLR) algorithm
by jointly optimize global confounder balancing and weighted logistic
regression. Global confounder balancing helps to identify causal features,
whose causal effect on outcome are stable across domains, then performing
logistic regression on those causal features constructs a robust predictive
model against the agnostic bias. To validate the effectiveness of our CRLR
algorithm, we conduct comprehensive experiments on both synthetic and real
world datasets. Experimental results clearly demonstrate that our CRLR
algorithm outperforms the stateoftheart methods, and the interpretability of
our method can be fully depicted by the feature visualization.

The function space of deeplearning machines is investigated by studying
growth in the entropy of functions of a given error with respect to a reference
function, realized by a deeplearning machine. Using physicsinspired methods
we study both sparsely and denselyconnected architectures to discover a
layerwise convergence of candidate functions, marked by a corresponding
reduction in entropy when approaching the reference function, gain insight into
the importance of having a large number of layers, and observe phase
transitions as the error increases.

The existence of kinetic ballooning mode (KBM) high order (nonground)
eigenstates for tokamak plasmas with steep gradient is demonstrated via
gyrokinetic electromagnetic eigenvalue solutions, which reveals that eigenmode
parity transition is an intrinsic property of electromagnetic plasmas. The
eigenstates with quantum number $l=0$ for ground state and $l=1,2,3\ldots$ for
nonground states are found to coexist and the most unstable one can be the
high order states ($l\neq0$). The conventional KBM is the $l=0$ state. It is
shown that the $l=1$ KBM has the same mode structure parity as the
microtearing mode (MTM). In contrast to the MTM, the $l=1$ KBM can be driven
by pressure gradient even without collisions and electron temperature gradient.
The relevance between various eigenstates of KBM under steep gradient and edge
plasma physics is discussed.

In this paper, we focus on the COMtype negative binomial distribution with
three parameters, which belongs to COMtype $(a,b,0)$ class distributions and
family of equilibrium distributions of arbitrary birthdeath process. Besides,
we show abundant distributional properties such as overdispersion and
underdispersion, logconcavity, logconvexity (infinite divisibility), pseudo
compound Poisson, stochastic ordering and asymptotic approximation. Some
characterizations including sum of equicorrelated geometrically distributed
random variables, conditional distribution, limit distribution of COMnegative
hypergeometric distribution, and Stein's identity are given for theoretical
properties. COMnegative binomial distribution was applied to overdispersion
and ultrahigh zeroinflated data sets. With the aid of ratio regression, we
employ maximum likelihood method to estimate the parameters and the
goodnessoffit are evaluated by the discrete KolmogorovSmirnov test.

In this paper we design information elicitation mechanisms for Bayesian
auctions. While in Bayesian mechanism design the distributions of the players'
private types are often assumed to be common knowledge, information elicitation
considers the situation where the players know the distributions better than
the decision maker. To weaken the information assumption in Bayesian auctions,
we consider an information structure where the knowledge about the
distributions is arbitrarily scattered among the players. In such an
unstructured information setting, we design mechanisms for unitdemand auctions
and additive auctions that aggregate the players' knowledge, generating revenue
that are constant approximations to the optimal Bayesian mechanisms with a
common prior. Our mechanisms are 2step dominantstrategy truthful and the
revenue increases gracefully with the amount of knowledge the players
collectively have.

With huge amounts of training data, deep learning has made great
breakthroughs in many artificial intelligence (AI) applications. However, such
largescale data sets present computational challenges, requiring training to
be distributed on a cluster equipped with accelerators like GPUs. With the fast
increase of GPU computing power, the data communications among GPUs have become
a potential bottleneck on the overall training performance. In this paper, we
first propose a general directed acyclic graph (DAG) model to describe the
distributed synchronous stochastic gradient descent (SSGD) algorithm, which
has been widely used in distributed deep learning frameworks. To understand the
practical impact of data communications on training performance, we conduct
extensive empirical studies on four stateoftheart distributed deep learning
frameworks (i.e., CaffeMPI, CNTK, MXNet and TensorFlow) over multiGPU and
multinode environments with different data communication techniques, including
PCIe, NVLink, 10GbE, and InfiniBand. Through both analytical and experimental
studies, we identify the potential bottlenecks and overheads that could be
further optimized. At last, we make the data set of our experimental traces
publicly available, which could be used to support simulationbased studies.

Two main models have been developed to explain the mechanisms of release,
heating and acceleration of the nascent solar wind, the waveturbulencedriven
(WTD) models and reconnectionloopopening (RLO) models, in which the plasma
release processes are fundamentally different. Given that the statistical
observational properties of helium ions produced in magnetically diverse solar
regions could provide valuable information for the solar wind modelling, we
examine the statistical properties of the helium abundance (A_He) and the speed
difference between helium ions and protons (v_alpha,p) for coronal holes (CHs),
active regions (ARs) and the quiet Sun (QS). We find bimodal distributions in
the space of A_He and v_alpha,p/v_A (where v_A is the local Alfven speed)for
the solar wind as a whole. The CH wind measurements are concentrated at higher
A_He and v_alpha,p/v_A values with a smaller A_He distribution range, while the
AR and QS wind is associated with lower A_He and v_alpha,p/v_A, and a larger
A_He distribution range. The magnetic diversity of the source regions and the
physical processes related to it are possibly responsible for the different
properties of A_He and v_alpha,p/v_A. The statistical results suggest that the
two solar wind generation mechanisms, WTD and RLO, work in parallel in all
solar wind source regions. In CH regions WTD plays a major role, whereas the
RLO mechanism is more important in AR and QS.

Statistical agentbased models for crime have shown that repeat victimization
can lead to predictable crime hotspots (see e.g. Short et al., Math. Models
Methods Appl., 2008), then a recent study in one space dimension (Chaturapruek
et al., SIAM J. Appl. Math, 2013) shows that the hotspot dynamics changes when
movement patterns of the criminals involve longtailed L\'evy distributions for
the jump length as opposed to classical random walks. In reality, criminals
move in confined areas with a maximum jump length. In this paper we develop a
meanfield continuum model with truncated L\'evy flights for residential
burglary in one space dimension. The continuum model yields local Laplace
diffusion, rather than fractional diffusion. We present an asymptotic theory to
derive the continuum equations and show excellent agreement between the
continuum model and the agentbased simulations. This suggests that local
diffusion models are universal for continuum limits of this problem, the
important quantity being the diffusion coefficient. Law enforcement agents are
also incorporated into the model, and the relative effectiveness of their
deployment strategies are compared quantitatively.

We investigate the quantum phase transitions for the $XXZ$ spin1/2 chains
via the quantum correlations between the nearest and next to nearest neighbor
spins characterized by negativity, information deficit, trace distance discord
and local quantum uncertainty. It is shown that all these correlations exhibit
the quantum phase transitions at $\Delta=1$. However, only information deficit
and local quantum uncertainty can demonstrate quantum phase transitions at
$\Delta=1$. The analytical and numerical behaviors of the quantum correlations
for the $XXZ$ system are presented. We also consider quantum correlations in
the HartreeFock ground state of the LipkinMeshkovGlick (LMG) model.

Generating good revenue is one of the most important problems in Bayesian
auction design, and many (approximately) optimal dominantstrategy incentive
compatible (DSIC) Bayesian mechanisms have been constructed for various auction
settings. However, most existing studies do not consider the complexity for the
seller to carry out the mechanism. It is assumed that the seller knows "each
single bit" of the distributions and is able to optimize perfectly based on the
entire distributions. Unfortunately, this is a strong assumption and may not
hold in reality: for example, when the value distributions have exponentially
large supports or do not have succinct representations.
In this work we consider, for the first time, the query complexity of
Bayesian mechanisms. We only allow the seller to have limited oracle accesses
to the players' value distributions, via quantile queries and value queries.
For a large class of auction settings, we prove logarithmic lowerbounds for
the query complexity for any DSIC Bayesian mechanism to be of any constant
approximation to the optimal revenue. For singleitem auctions and multiitem
auctions with unitdemand or additive valuation functions, we prove tight
upperbounds via efficient query schemes, without requiring the distributions
to be regular or have monotone hazard rate. Thus, in those auction settings the
seller needs to access much less than the full distributions in order to
achieve approximately optimal revenue.

In this paper, we focus on how to dynamically allocate a divisible resource
fairly among n players who arrive and depart over time. The players may have
general heterogeneous valuations over the resource. It is known that the exact
envyfree and proportional allocations may not exist in the dynamic setting
[Walsh, 2011]. Thus, we will study to what extent we can guarantee the fairness
in the dynamic setting. We first design two algorithms which are O(log
n)proportional and O(n)envyfree for the setting with general valuations, and
by constructing the adversary instances such that all dynamic algorithms must
be at least Omega(1)proportional and Omega(n/log n)envyfree, we show that
the bounds are tight up to a logarithmic factor. Moreover, we introduce the
setting where the players' valuations are uniform on the resource but with
different demands, which generalize the setting of [Friedman et al., 2015]. We
prove an O(log n) upper bound and a tight lower bound for this case.

Recent studies show that the stateoftheart deep neural networks (DNNs) are
vulnerable to adversarial examples, resulting from smallmagnitude
perturbations added to the input. Given that that emerging physical systems are
using DNNs in safetycritical situations, adversarial examples could mislead
these systems and cause dangerous situations.Therefore, understanding
adversarial examples in the physical world is an important step towards
developing resilient learning algorithms. We propose a general attack
algorithm,Robust Physical Perturbations (RP2), to generate robust visual
adversarial perturbations under different physical conditions. Using the
realworld case of road sign classification, we show that adversarial examples
generated using RP2 achieve high targeted misclassification rates against
standardarchitecture road sign classifiers in the physical world under various
environmental conditions, including viewpoints. Due to the current lack of a
standardized testing method, we propose a twostage evaluation methodology for
robust physical adversarial examples consisting of lab and field tests. Using
this methodology, we evaluate the efficacy of physical adversarial
manipulations on real objects. Witha perturbation in the form of only black and
white stickers,we attack a real stop sign, causing targeted misclassification
in 100% of the images obtained in lab settings, and in 84.8%of the captured
video frames obtained on a moving vehicle(field test) for the target
classifier.

We formulate a microscopic linear response theory of nonequilibrium magnonic
torques and magnon pumping applicable to multiplemagnonicband uniform
ferromagnets with DzyaloshinskiiMoriya interactions. From the linear response
theory, we identify the extrinsic and intrinsic contributions where the latter
is expressed via the Berry curvature of magnonic bands. We observe that in the
presence of a timedependent magnetization DzyaloshinskiiMoriya interactions
can act as fictitious electric fields acting on magnons. We study various
current responses to this fictitious field and analyze the role of Berry
curvature. After identifying the magnonmediated contribution to the
equilibrium DzyaloshinskiiMoriya interaction, we also establish the Onsager
reciprocity between the magnonmediated torques and heat pumping. We apply our
theory to the magnonic heat pumping and torque responses in honeycomb and
kagome lattice ferromagnets.

As machine learning becomes widely used for automated decisions, attackers
have strong incentives to manipulate the results and models generated by
machine learning algorithms. In this paper, we perform the first systematic
study of poisoning attacks and their countermeasures for linear regression
models. In poisoning attacks, attackers deliberately influence the training
data to manipulate the results of a predictive model. We propose a
theoreticallygrounded optimization framework specifically designed for linear
regression and demonstrate its effectiveness on a range of datasets and models.
We also introduce a fast statistical attack that requires limited knowledge of
the training process. Finally, we design a new principled defense method that
is highly resilient against all poisoning attacks. We provide formal guarantees
about its convergence and an upper bound on the effect of poisoning attacks
when the defense is deployed. We evaluate extensively our attacks and defenses
on three realistic datasets from health care, loan assessment, and real estate
domains.

Object proposal generation methods have been widely applied to many computer
vision tasks. However, existing object proposal generation methods often suffer
from the problems of motion blur, low contrast, deformation, etc., when they
are applied to video related tasks. In this paper, we propose an effective and
highly accurate targetspecific object proposal generation (TOPG) method, which
takes full advantage of the context information of a video to alleviate these
problems. Specifically, we propose to generate targetspecific object proposals
by integrating the information of two important objectness cues: colors and
edges, which are complementary to each other for different challenging
environments in the process of generating object proposals. As a result, the
recall of the proposed TOPG method is significantly increased. Furthermore, we
propose an object proposal ranking strategy to increase the rank accuracy of
the generated object proposals. The proposed TOPG method has yielded
significant recall gain (about 20%60% higher) compared with several
stateoftheart object proposal methods on several challenging visual tracking
datasets. Then, we apply the proposed TOPG method to the task of visual
tracking and propose a TOPGbased tracker (called as TOPGT), where TOPG is used
as a sample selection strategy to select a small number of highquality target
candidates from the generated object proposals. Since the object proposals
generated by the proposed TOPG cover many hard negative samples and positive
samples, these object proposals can not only be used for training an effective
classifier, but also be used as target candidates for visual tracking.
Experimental results show the superior performance of TOPGT for visual tracking
compared with several other stateoftheart visual trackers (about 3%11%
higher than the winner of the VOT2015 challenge in term of distance precision).

Current face or object detection methods via convolutional neural network
(such as OverFeat, RCNN and DenseNet) explicitly extract multiscale features
based on an image pyramid. However, such a strategy increases the computational
burden for face detection. In this paper, we propose a fast face detection
method based on discriminative complete features (DCFs) extracted by an
elaborately designed convolutional neural network, where face detection is
directly performed on the complete feature maps. DCFs have shown the ability of
scale invariance, which is beneficial for face detection with high speed and
promising performance. Therefore, extracting multiscale features on an image
pyramid employed in the conventional methods is not required in the proposed
method, which can greatly improve its efficiency for face detection.
Experimental results on several popular face detection datasets show the
efficiency and the effectiveness of the proposed method for face detection.

We use high spatial and temporal resolution observations, simultaneously
obtained with the New Vacuum Solar Telescope and Atmospheric Imaging Assembly
(AIA) on board the Solar Dynamics Observatory, to investigate the
highfrequency oscillations above a sunspot umbra. A novel timefrequency
analysis method, namely the synchrosqueezing transform (SST), is employed to
represent their power spectra and to reconstruct the highfrequency signals at
different solar atmospheric layers. A validation study with synthetic signals
demonstrates that SST is capable to resolving weak signals even when their
strength is comparable with the highfrequency noise. The power spectra,
obtained from both SST and the Fourier transform, of the entire umbral region
indicate that there are significant enhancements between 10 and 14 mHz (labeled
as 12 mHz) at different atmospheric layers. Analyzing the spectrum of a
photospheric region far away from the umbra demonstrates that this 12~mHz
component exists only inside the umbra. The animation based on the
reconstructed 12 mHz component in AIA 171 \AA\ illustrates that an
intermittently propagating wave first emerges near the footpoints of coronal
fan structures, and then propagates outward along the structures. A
timedistance diagram, coupled with a subsonic wave speed ($\sim$ 49 km
s$^{1}$), highlights the fact that these coronal perturbations are best
described as upwardly propagating magnetoacoustic slow waves. Thus, we first
reveal the highfrequency oscillations with a period around one minute in
imaging observations at different height above an umbra, and these oscillations
seem to be related to the umbral perturbations in the photosphere.

The Interface Region Imaging Spectrograph (IRIS) reveals numerous smallscale
(subarcsecond) brightenings that appear as bright dots sparkling the solar
transition region in active regions. Here, we report a statistical study on
these transition region bright dots. We use an automatic approach to identify
2742 dots in a Si IV raster image. We find that the average spatial size of the
dots is 0.8 arcsec$^2$ and most of them are located in the faculae area. Their
Doppler velocities obtained from the Si IV 1394 {\AA} line range from 20 to 20
km/s. Among these 2742 dots, 1224 are predominantly blueshifted and 1518 are
redshifted. Their nonthermal velocities range from 4 to 50 km/s with an
average of 24 km/s. We speculate that the bright dots studied here are
smallscale impulsive energetic events that can heat the active region corona.

Cone spherical metrics are conformal metrics with constant curvature one and
finitely many conical singularities on compact Riemann surfaces. By using
Strebel differentials as a bridge, we construct a new class of cone spherical
metrics on compact Riemann surfaces by drawing on the surfaces some class of
connected metric ribbon graphs.

Deep learning defines a new datadriven programming paradigm that constructs
the internal system logic of a crafted neuron network through a set of training
data. Deep learning (DL) has been widely adopted in many safetycritical
scenarios. However, a plethora of studies have shown that the stateoftheart
DL systems suffer from various vulnerabilities which can lead to severe
consequences when applied to realworld applications. Currently, the robustness
of a DL system against adversarial attacks is usually measured by the accuracy
of test data. Considering the limitation of accessible test data, good
performance on test data can hardly guarantee the robustness and generality of
DL systems. Different from traditional software systems which have clear and
controllable logic and functionality, a DL system is trained with data and
lacks thorough understanding. This makes it difficult for system analysis and
defect detection, which could potentially hinder its realworld deployment
without safety guarantees. In this paper, we propose DeepGauge, a comprehensive
and multigranularity testing criteria for DL systems, which renders a complete
and multifaceted portrayal of the testbed. The indepth evaluation of our
proposed testing criteria is demonstrated on two wellknown datasets, five DL
systems, with four stateoftheart adversarial data generation techniques. The
effectiveness of DeepGauge sheds light on the construction of robust DL
systems.

Deep Neural Networks (DNNs) have recently been shown to be vulnerable against
adversarial examples, which are carefully crafted instances that can mislead
DNNs to make errors during prediction. To better understand such attacks, a
characterization is needed of the properties of regions (the socalled
'adversarial subspaces') in which adversarial examples lie. We tackle this
challenge by characterizing the dimensional properties of adversarial regions,
via the use of Local Intrinsic Dimensionality (LID). LID assesses the
spacefilling capability of the region surrounding a reference example, based
on the distance distribution of the example to its neighbors. We first provide
explanations about how adversarial perturbation can affect the LID
characteristic of adversarial regions, and then show empirically that LID
characteristics can facilitate the distinction of adversarial examples
generated using stateoftheart attacks. As a proofofconcept, we show that a
potential application of LID is to distinguish adversarial examples, and the
preliminary results show that it can outperform several stateoftheart
detection measures by large margins for five attack strategies considered in
this paper across three benchmark datasets. Our analysis of the LID
characteristic for adversarial regions not only motivates new directions of
effective adversarial defense, but also opens up more challenges for developing
new attacks to better understand the vulnerabilities of DNNs.

Attentionbased encoderdecoder architectures such as Listen, Attend, and
Spell (LAS), subsume the acoustic, pronunciation and language model components
of a traditional automatic speech recognition (ASR) system into a single neural
network. In previous work, we have shown that such architectures are comparable
to stateoftheart ASR systems on dictation tasks, but it was not clear if such
architectures would be practical for more challenging tasks such as voice
search. In this work, we explore a variety of structural and optimization
improvements to our LAS model which significantly improve performance. On the
structural side, we show that word piece models can be used instead of
graphemes. We also introduce a multihead attention architecture, which offers
improvements over the commonlyused singlehead attention. On the optimization
side, we explore synchronous training, scheduled sampling, label smoothing, and
minimum word error rate optimization, which are all shown to improve accuracy.
We present results with a unidirectional LSTM encoder for streaming
recognition. On a 12, 500 hour voice search task, we find that the proposed
changes improve the WER from 9.2% to 5.6%, while the best conventional system
achieves 6.7%; on a dictation task our model achieves a WER of 4.1% compared to
5% for the conventional system.