• Machine learning (ML) techniques are increasingly common in security applications, such as malware and intrusion detection. However, ML models are often susceptible to evasion attacks, in which an adversary makes changes to the input (such as malware) in order to avoid being detected. A conventional approach to evaluate ML robustness to such attacks, as well as to design robust ML, is by considering simplified feature-space models of attacks, where the attacker changes ML features directly to effect evasion, while minimizing or constraining the magnitude of this change. We investigate the effectiveness of this approach to designing robust ML in the face of attacks that can be realized in actual malware (realizable attacks). We demonstrate that in the context of structure-based PDF malware detection, such techniques appear to have limited effectiveness, but they are effective with content-based detectors. In either case, we show that augmenting the feature space models with conserved features (those that cannot be unilaterally modified without compromising malicious functionality) significantly improves performance. Finally, we show that feature space models enable generalized robustness when faced with a variety of realizable attacks, as compared to classifiers which are tuned to be robust to a specific realizable attack.
  • For spectrally negative L\'evy processes, adapting an approach from \cite{BoLi:sub1} we identify joint Laplace transforms involving local times evaluated at either the first passage times, or independent exponential times, or inverse local times. The Laplace transforms are expressed in terms of the associated scale functions. Connections are made with the permanental process and the Markovian loop soup measure.
  • We establish a correspondence on a Riemann surface between hyperbolic metrics with isolated singularities and bounded projective functions whose Schwarzian derivatives have at most double poles and whose monodromies lie in ${\rm PSU}(1,\,1)$. As an application, we construct explicitly a new class of hyperbolic metrics with countably many singularities on the unit disc.
  • Most of previous machine learning algorithms are proposed based on the i.i.d. hypothesis. However, this ideal assumption is often violated in real applications, where selection bias may arise between training and testing process. Moreover, in many scenarios, the testing data is not even available during the training process, which makes the traditional methods like transfer learning infeasible due to their need on prior of test distribution. Therefore, how to address the agnostic selection bias for robust model learning is of paramount importance for both academic research and real applications. In this paper, under the assumption that causal relationships among variables are robust across domains, we incorporate causal technique into predictive modeling and propose a novel Causally Regularized Logistic Regression (CRLR) algorithm by jointly optimize global confounder balancing and weighted logistic regression. Global confounder balancing helps to identify causal features, whose causal effect on outcome are stable across domains, then performing logistic regression on those causal features constructs a robust predictive model against the agnostic bias. To validate the effectiveness of our CRLR algorithm, we conduct comprehensive experiments on both synthetic and real world datasets. Experimental results clearly demonstrate that our CRLR algorithm outperforms the state-of-the-art methods, and the interpretability of our method can be fully depicted by the feature visualization.
  • The function space of deep-learning machines is investigated by studying growth in the entropy of functions of a given error with respect to a reference function, realized by a deep-learning machine. Using physics-inspired methods we study both sparsely and densely-connected architectures to discover a layer-wise convergence of candidate functions, marked by a corresponding reduction in entropy when approaching the reference function, gain insight into the importance of having a large number of layers, and observe phase transitions as the error increases.
  • The existence of kinetic ballooning mode (KBM) high order (non-ground) eigenstates for tokamak plasmas with steep gradient is demonstrated via gyrokinetic electromagnetic eigenvalue solutions, which reveals that eigenmode parity transition is an intrinsic property of electromagnetic plasmas. The eigenstates with quantum number $l=0$ for ground state and $l=1,2,3\ldots$ for non-ground states are found to coexist and the most unstable one can be the high order states ($l\neq0$). The conventional KBM is the $l=0$ state. It is shown that the $l=1$ KBM has the same mode structure parity as the micro-tearing mode (MTM). In contrast to the MTM, the $l=1$ KBM can be driven by pressure gradient even without collisions and electron temperature gradient. The relevance between various eigenstates of KBM under steep gradient and edge plasma physics is discussed.
  • In this paper, we focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type $(a,b,0)$ class distributions and family of equilibrium distributions of arbitrary birth-death process. Besides, we show abundant distributional properties such as overdispersion and underdispersion, log-concavity, log-convexity (infinite divisibility), pseudo compound Poisson, stochastic ordering and asymptotic approximation. Some characterizations including sum of equicorrelated geometrically distributed random variables, conditional distribution, limit distribution of COM-negative hypergeometric distribution, and Stein's identity are given for theoretical properties. COM-negative binomial distribution was applied to overdispersion and ultrahigh zero-inflated data sets. With the aid of ratio regression, we employ maximum likelihood method to estimate the parameters and the goodness-of-fit are evaluated by the discrete Kolmogorov-Smirnov test.
  • In this paper we design information elicitation mechanisms for Bayesian auctions. While in Bayesian mechanism design the distributions of the players' private types are often assumed to be common knowledge, information elicitation considers the situation where the players know the distributions better than the decision maker. To weaken the information assumption in Bayesian auctions, we consider an information structure where the knowledge about the distributions is arbitrarily scattered among the players. In such an unstructured information setting, we design mechanisms for unit-demand auctions and additive auctions that aggregate the players' knowledge, generating revenue that are constant approximations to the optimal Bayesian mechanisms with a common prior. Our mechanisms are 2-step dominant-strategy truthful and the revenue increases gracefully with the amount of knowledge the players collectively have.
  • With huge amounts of training data, deep learning has made great breakthroughs in many artificial intelligence (AI) applications. However, such large-scale data sets present computational challenges, requiring training to be distributed on a cluster equipped with accelerators like GPUs. With the fast increase of GPU computing power, the data communications among GPUs have become a potential bottleneck on the overall training performance. In this paper, we first propose a general directed acyclic graph (DAG) model to describe the distributed synchronous stochastic gradient descent (S-SGD) algorithm, which has been widely used in distributed deep learning frameworks. To understand the practical impact of data communications on training performance, we conduct extensive empirical studies on four state-of-the-art distributed deep learning frameworks (i.e., Caffe-MPI, CNTK, MXNet and TensorFlow) over multi-GPU and multi-node environments with different data communication techniques, including PCIe, NVLink, 10GbE, and InfiniBand. Through both analytical and experimental studies, we identify the potential bottlenecks and overheads that could be further optimized. At last, we make the data set of our experimental traces publicly available, which could be used to support simulation-based studies.
  • Two main models have been developed to explain the mechanisms of release, heating and acceleration of the nascent solar wind, the wave-turbulence-driven (WTD) models and reconnection-loop-opening (RLO) models, in which the plasma release processes are fundamentally different. Given that the statistical observational properties of helium ions produced in magnetically diverse solar regions could provide valuable information for the solar wind modelling, we examine the statistical properties of the helium abundance (A_He) and the speed difference between helium ions and protons (v_alpha,p) for coronal holes (CHs), active regions (ARs) and the quiet Sun (QS). We find bimodal distributions in the space of A_He and v_alpha,p/v_A (where v_A is the local Alfven speed)for the solar wind as a whole. The CH wind measurements are concentrated at higher A_He and v_alpha,p/v_A values with a smaller A_He distribution range, while the AR and QS wind is associated with lower A_He and v_alpha,p/v_A, and a larger A_He distribution range. The magnetic diversity of the source regions and the physical processes related to it are possibly responsible for the different properties of A_He and v_alpha,p/v_A. The statistical results suggest that the two solar wind generation mechanisms, WTD and RLO, work in parallel in all solar wind source regions. In CH regions WTD plays a major role, whereas the RLO mechanism is more important in AR and QS.
  • Statistical agent-based models for crime have shown that repeat victimization can lead to predictable crime hotspots (see e.g. Short et al., Math. Models Methods Appl., 2008), then a recent study in one space dimension (Chaturapruek et al., SIAM J. Appl. Math, 2013) shows that the hotspot dynamics changes when movement patterns of the criminals involve long-tailed L\'evy distributions for the jump length as opposed to classical random walks. In reality, criminals move in confined areas with a maximum jump length. In this paper we develop a mean-field continuum model with truncated L\'evy flights for residential burglary in one space dimension. The continuum model yields local Laplace diffusion, rather than fractional diffusion. We present an asymptotic theory to derive the continuum equations and show excellent agreement between the continuum model and the agent-based simulations. This suggests that local diffusion models are universal for continuum limits of this problem, the important quantity being the diffusion coefficient. Law enforcement agents are also incorporated into the model, and the relative effectiveness of their deployment strategies are compared quantitatively.
  • We investigate the quantum phase transitions for the $XXZ$ spin-1/2 chains via the quantum correlations between the nearest and next to nearest neighbor spins characterized by negativity, information deficit, trace distance discord and local quantum uncertainty. It is shown that all these correlations exhibit the quantum phase transitions at $\Delta=-1$. However, only information deficit and local quantum uncertainty can demonstrate quantum phase transitions at $\Delta=1$. The analytical and numerical behaviors of the quantum correlations for the $XXZ$ system are presented. We also consider quantum correlations in the Hartree-Fock ground state of the Lipkin-Meshkov-Glick (LMG) model.
  • Generating good revenue is one of the most important problems in Bayesian auction design, and many (approximately) optimal dominant-strategy incentive compatible (DSIC) Bayesian mechanisms have been constructed for various auction settings. However, most existing studies do not consider the complexity for the seller to carry out the mechanism. It is assumed that the seller knows "each single bit" of the distributions and is able to optimize perfectly based on the entire distributions. Unfortunately, this is a strong assumption and may not hold in reality: for example, when the value distributions have exponentially large supports or do not have succinct representations. In this work we consider, for the first time, the query complexity of Bayesian mechanisms. We only allow the seller to have limited oracle accesses to the players' value distributions, via quantile queries and value queries. For a large class of auction settings, we prove logarithmic lower-bounds for the query complexity for any DSIC Bayesian mechanism to be of any constant approximation to the optimal revenue. For single-item auctions and multi-item auctions with unit-demand or additive valuation functions, we prove tight upper-bounds via efficient query schemes, without requiring the distributions to be regular or have monotone hazard rate. Thus, in those auction settings the seller needs to access much less than the full distributions in order to achieve approximately optimal revenue.
  • In this paper, we focus on how to dynamically allocate a divisible resource fairly among n players who arrive and depart over time. The players may have general heterogeneous valuations over the resource. It is known that the exact envy-free and proportional allocations may not exist in the dynamic setting [Walsh, 2011]. Thus, we will study to what extent we can guarantee the fairness in the dynamic setting. We first design two algorithms which are O(log n)-proportional and O(n)-envy-free for the setting with general valuations, and by constructing the adversary instances such that all dynamic algorithms must be at least Omega(1)-proportional and Omega(n/log n)-envy-free, we show that the bounds are tight up to a logarithmic factor. Moreover, we introduce the setting where the players' valuations are uniform on the resource but with different demands, which generalize the setting of [Friedman et al., 2015]. We prove an O(log n) upper bound and a tight lower bound for this case.
  • Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations.Therefore, understanding adversarial examples in the physical world is an important step towards developing resilient learning algorithms. We propose a general attack algorithm,Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions. Using the real-world case of road sign classification, we show that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints. Due to the current lack of a standardized testing method, we propose a two-stage evaluation methodology for robust physical adversarial examples consisting of lab and field tests. Using this methodology, we evaluate the efficacy of physical adversarial manipulations on real objects. Witha perturbation in the form of only black and white stickers,we attack a real stop sign, causing targeted misclassification in 100% of the images obtained in lab settings, and in 84.8%of the captured video frames obtained on a moving vehicle(field test) for the target classifier.
  • We formulate a microscopic linear response theory of nonequilibrium magnonic torques and magnon pumping applicable to multiple-magnonic-band uniform ferromagnets with Dzyaloshinskii-Moriya interactions. From the linear response theory, we identify the extrinsic and intrinsic contributions where the latter is expressed via the Berry curvature of magnonic bands. We observe that in the presence of a time-dependent magnetization Dzyaloshinskii-Moriya interactions can act as fictitious electric fields acting on magnons. We study various current responses to this fictitious field and analyze the role of Berry curvature. After identifying the magnon-mediated contribution to the equilibrium Dzyaloshinskii-Moriya interaction, we also establish the Onsager reciprocity between the magnon-mediated torques and heat pumping. We apply our theory to the magnonic heat pumping and torque responses in honeycomb and kagome lattice ferromagnets.
  • As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by machine learning algorithms. In this paper, we perform the first systematic study of poisoning attacks and their countermeasures for linear regression models. In poisoning attacks, attackers deliberately influence the training data to manipulate the results of a predictive model. We propose a theoretically-grounded optimization framework specifically designed for linear regression and demonstrate its effectiveness on a range of datasets and models. We also introduce a fast statistical attack that requires limited knowledge of the training process. Finally, we design a new principled defense method that is highly resilient against all poisoning attacks. We provide formal guarantees about its convergence and an upper bound on the effect of poisoning attacks when the defense is deployed. We evaluate extensively our attacks and defenses on three realistic datasets from health care, loan assessment, and real estate domains.
  • Object proposal generation methods have been widely applied to many computer vision tasks. However, existing object proposal generation methods often suffer from the problems of motion blur, low contrast, deformation, etc., when they are applied to video related tasks. In this paper, we propose an effective and highly accurate target-specific object proposal generation (TOPG) method, which takes full advantage of the context information of a video to alleviate these problems. Specifically, we propose to generate target-specific object proposals by integrating the information of two important objectness cues: colors and edges, which are complementary to each other for different challenging environments in the process of generating object proposals. As a result, the recall of the proposed TOPG method is significantly increased. Furthermore, we propose an object proposal ranking strategy to increase the rank accuracy of the generated object proposals. The proposed TOPG method has yielded significant recall gain (about 20%-60% higher) compared with several state-of-the-art object proposal methods on several challenging visual tracking datasets. Then, we apply the proposed TOPG method to the task of visual tracking and propose a TOPG-based tracker (called as TOPGT), where TOPG is used as a sample selection strategy to select a small number of high-quality target candidates from the generated object proposals. Since the object proposals generated by the proposed TOPG cover many hard negative samples and positive samples, these object proposals can not only be used for training an effective classifier, but also be used as target candidates for visual tracking. Experimental results show the superior performance of TOPGT for visual tracking compared with several other state-of-the-art visual trackers (about 3%-11% higher than the winner of the VOT2015 challenge in term of distance precision).
  • Current face or object detection methods via convolutional neural network (such as OverFeat, R-CNN and DenseNet) explicitly extract multi-scale features based on an image pyramid. However, such a strategy increases the computational burden for face detection. In this paper, we propose a fast face detection method based on discriminative complete features (DCFs) extracted by an elaborately designed convolutional neural network, where face detection is directly performed on the complete feature maps. DCFs have shown the ability of scale invariance, which is beneficial for face detection with high speed and promising performance. Therefore, extracting multi-scale features on an image pyramid employed in the conventional methods is not required in the proposed method, which can greatly improve its efficiency for face detection. Experimental results on several popular face detection datasets show the efficiency and the effectiveness of the proposed method for face detection.
  • We use high spatial and temporal resolution observations, simultaneously obtained with the New Vacuum Solar Telescope and Atmospheric Imaging Assembly (AIA) on board the Solar Dynamics Observatory, to investigate the high-frequency oscillations above a sunspot umbra. A novel time--frequency analysis method, namely the synchrosqueezing transform (SST), is employed to represent their power spectra and to reconstruct the high-frequency signals at different solar atmospheric layers. A validation study with synthetic signals demonstrates that SST is capable to resolving weak signals even when their strength is comparable with the high-frequency noise. The power spectra, obtained from both SST and the Fourier transform, of the entire umbral region indicate that there are significant enhancements between 10 and 14 mHz (labeled as 12 mHz) at different atmospheric layers. Analyzing the spectrum of a photospheric region far away from the umbra demonstrates that this 12~mHz component exists only inside the umbra. The animation based on the reconstructed 12 mHz component in AIA 171 \AA\ illustrates that an intermittently propagating wave first emerges near the footpoints of coronal fan structures, and then propagates outward along the structures. A time--distance diagram, coupled with a subsonic wave speed ($\sim$ 49 km s$^{-1}$), highlights the fact that these coronal perturbations are best described as upwardly propagating magnetoacoustic slow waves. Thus, we first reveal the high-frequency oscillations with a period around one minute in imaging observations at different height above an umbra, and these oscillations seem to be related to the umbral perturbations in the photosphere.
  • The Interface Region Imaging Spectrograph (IRIS) reveals numerous small-scale (sub-arcsecond) brightenings that appear as bright dots sparkling the solar transition region in active regions. Here, we report a statistical study on these transition region bright dots. We use an automatic approach to identify 2742 dots in a Si IV raster image. We find that the average spatial size of the dots is 0.8 arcsec$^2$ and most of them are located in the faculae area. Their Doppler velocities obtained from the Si IV 1394 {\AA} line range from -20 to 20 km/s. Among these 2742 dots, 1224 are predominantly blue-shifted and 1518 are red-shifted. Their nonthermal velocities range from 4 to 50 km/s with an average of 24 km/s. We speculate that the bright dots studied here are small-scale impulsive energetic events that can heat the active region corona.
  • Cone spherical metrics are conformal metrics with constant curvature one and finitely many conical singularities on compact Riemann surfaces. By using Strebel differentials as a bridge, we construct a new class of cone spherical metrics on compact Riemann surfaces by drawing on the surfaces some class of connected metric ribbon graphs.
  • Deep learning defines a new data-driven programming paradigm that constructs the internal system logic of a crafted neuron network through a set of training data. Deep learning (DL) has been widely adopted in many safety-critical scenarios. However, a plethora of studies have shown that the state-of-the-art DL systems suffer from various vulnerabilities which can lead to severe consequences when applied to real-world applications. Currently, the robustness of a DL system against adversarial attacks is usually measured by the accuracy of test data. Considering the limitation of accessible test data, good performance on test data can hardly guarantee the robustness and generality of DL systems. Different from traditional software systems which have clear and controllable logic and functionality, a DL system is trained with data and lacks thorough understanding. This makes it difficult for system analysis and defect detection, which could potentially hinder its real-world deployment without safety guarantees. In this paper, we propose DeepGauge, a comprehensive and multi-granularity testing criteria for DL systems, which renders a complete and multi-faceted portrayal of the testbed. The in-depth evaluation of our proposed testing criteria is demonstrated on two well-known datasets, five DL systems, with four state-of-the-art adversarial data generation techniques. The effectiveness of DeepGauge sheds light on the construction of robust DL systems.
  • Deep Neural Networks (DNNs) have recently been shown to be vulnerable against adversarial examples, which are carefully crafted instances that can mislead DNNs to make errors during prediction. To better understand such attacks, a characterization is needed of the properties of regions (the so-called 'adversarial subspaces') in which adversarial examples lie. We tackle this challenge by characterizing the dimensional properties of adversarial regions, via the use of Local Intrinsic Dimensionality (LID). LID assesses the space-filling capability of the region surrounding a reference example, based on the distance distribution of the example to its neighbors. We first provide explanations about how adversarial perturbation can affect the LID characteristic of adversarial regions, and then show empirically that LID characteristics can facilitate the distinction of adversarial examples generated using state-of-the-art attacks. As a proof-of-concept, we show that a potential application of LID is to distinguish adversarial examples, and the preliminary results show that it can outperform several state-of-the-art detection measures by large margins for five attack strategies considered in this paper across three benchmark datasets. Our analysis of the LID characteristic for adversarial regions not only motivates new directions of effective adversarial defense, but also opens up more challenges for developing new attacks to better understand the vulnerabilities of DNNs.
  • Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network. In previous work, we have shown that such architectures are comparable to state-of-theart ASR systems on dictation tasks, but it was not clear if such architectures would be practical for more challenging tasks such as voice search. In this work, we explore a variety of structural and optimization improvements to our LAS model which significantly improve performance. On the structural side, we show that word piece models can be used instead of graphemes. We also introduce a multi-head attention architecture, which offers improvements over the commonly-used single-head attention. On the optimization side, we explore synchronous training, scheduled sampling, label smoothing, and minimum word error rate optimization, which are all shown to improve accuracy. We present results with a unidirectional LSTM encoder for streaming recognition. On a 12, 500 hour voice search task, we find that the proposed changes improve the WER from 9.2% to 5.6%, while the best conventional system achieves 6.7%; on a dictation task our model achieves a WER of 4.1% compared to 5% for the conventional system.