• ### Global and Local Multiple SLEs for $\kappa \leq 4$ and Connection Probabilities for Level Lines of GFF(1703.00898)

March 3, 2019 math-ph, math.MP, math.PR
This article pertains to the classification of multiple Schramm-Loewner evolutions (SLE). We construct the pure partition functions of multiple SLE$(\kappa)$ with $\kappa \in (0,4]$ and relate them to certain extremal multiple SLE measures, thus verifying a conjecture from [BBK05, KP16]. We prove that the two approaches to construct multiple SLEs - the global, configurational construction of [KL07, Law09a] and the local, growth process construction of [BBK05, Dub07, Gra07, KP16] - agree. The pure partition functions are closely related to crossing probabilities in critical statistical mechanics models. With explicit formulas in the special case of $\kappa = 4$, we show that these functions give the connection probabilities for the level lines of the Gaussian free field (GFF) with alternating boundary data. We also show that certain functions, known as conformal blocks, give rise to multiple SLE(4) that can be naturally coupled with the GFF with appropriate boundary data.
• ### Soft Label Memorization-Generalization for Natural Language Inference(1702.08563)

Jan. 25, 2019 cs.CL
Often when multiple labels are obtained for a training example it is assumed that there is an element of noise that must be accounted for. It has been shown that this disagreement can be considered signal instead of noise. In this work we investigate using soft labels for training data to improve generalization in machine learning models. However, using soft labels for training Deep Neural Networks (DNNs) is not practical due to the costs involved in obtaining multiple labels for large data sets. We propose soft label memorization-generalization (SLMG), a fine-tuning approach to using soft labels for training DNNs. We assume that differences in labels provided by human annotators represent ambiguity about the true label instead of noise. Experiments with SLMG demonstrate improved generalization performance on the Natural Language Inference (NLI) task. Our experiments show that by injecting a small percentage of soft label training data (0.03% of training set size) we can improve generalization performance over several baselines.
• ### Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study(1702.04811)

Sept. 7, 2018 cs.CL
Interpreting the performance of deep learning models beyond test set accuracy is challenging. Characteristics of individual data points are often not considered during evaluation, and each data point is treated equally. We examine the impact of a test set question's difficulty to determine if there is a relationship between difficulty and performance. We model difficulty using well-studied psychometric methods on human response patterns. Experiments on Natural Language Inference (NLI) and Sentiment Analysis (SA) show that the likelihood of answering a question correctly is impacted by the question's difficulty. As DNNs are trained with more data, easy examples are learned more quickly than hard examples.
• ### Getting in shape and swimming: the role of cortical forces and membrane heterogeneity in eukaryotic cells(1710.01618)

Recent research has shown that motile cells can adapt their mode of propulsion to the mechanical properties of the environment in which they find themselves--crawling in some environments while swimming in others. The latter can involve movement by blebbing or other cyclic shape changes, and both highlysimplified and more realistic models of these modes have been studied previously. Herein we study swimming that is driven by membrane tension gradients that arise from flows in the actin cortex underlying the membrane, and does not involve imposed cyclic shape changes. Such gradients can lead to a number of different characteristic cell shapes, and our first objective is to understand how different distributions of membrane tension influence the shape of cells in an inviscid quiescent fluid. We then analyze the effects of spatial variation in other membrane properties, and how they interact with tension gradients to determine the shape. We also study the effect of fluid--cell interactions and show how tension leads to cell movement, how the balance between tension gradients and a variable bending modulus determine the shape and direction of movement, and how the efficiency of movement depends on the properties of the fluid and the distribution of tension and bending modulus in the membrane.
• ### HOMFLYPT Homology over $\mathbb{Z}_2$ Detects Unlinks(1708.07139)

May 22, 2018 math.GT
We apply the Rasmussen spectral sequence to prove that the $\mathbb{Z}^3$-graded vector space structure of the HOMFLYPT homology over $\mathbb{Z}_2$ detects unlinks. Our proof relies on a theorem of Batson and Seed stating that the $\mathbb{Z}^2$-graded vector space structure of the Khovanov homology over $\mathbb{Z}_2$ detects unlinks.
• ### Edge States and Broken Symmetry Phases of Laterally Confined $^3$He Films(1805.00936)

May 2, 2018 cond-mat.supr-con
Broken symmetries in topological condensed matter systems have implications for the spectrum of Fermionic excitations confined on surfaces or topological defects. The Fermionic spectrum of confined (quasi-2D) $^3$He-A consists of branches of chiral edge states. The negative energy states are related to the ground-state angular momentum, $L_z = (N/2) \hbar$, for $N/2$ Cooper pairs. The power law suppression of the angular momentum, $L_z(T) \simeq (N/2)\,\hbar\,[1 - \frac{2}{3}(\pi T/\Delta)^2 ]$ for $0 \le T \ll T_c$, in the fully gapped 2D chiral A-phase reflects the thermal excitation of the chiral edge Fermions. We discuss the effects of wave function overlap, and hybridization between edge states confined near opposing surfaces on the edge currents, ground-state angular momentum and ground-state order parameter. Under strong lateral confinement, the chiral A phase undergoes a sequence of phase transitions, first to a pair density wave (PDW) phase with broken translational symmetry at $D_{c2} \approx 16 \xi_0$. The PDW phase is described by a periodic array of chiral domains with alternating chirality, separated by domain walls. The period of PDW phase diverges as the confinement length $D\rightarrow D_{c_2}$. The PDW phase breaks time-reversal symmetry, translation invariance, but is invariant under the combination of time-reversal and translation by a one-half period of the PDW. The mass current distribution of the PDW phase reflects this combined symmetry, and orignates from the spectra of edge Fermions and the chiral branches bound to the domain walls. Under sufficiently strong confinement a second-order transition occurs to the non-chiral "polar phase" at $D_{c1} \approx 9\xi_0$, in which a single p-wave orbital state of Cooper pairs is aligned along the channel.
• ### Verifying Concurrent Stacks by Divergence-Sensitive Bisimulation(1701.06104)

April 22, 2018 cs.PL
The verification of linearizability -- a key correctness criterion for concurrent objects -- is based on trace refinement whose checking is PSPACE-complete. This paper suggests to use \emph{branching} bisimulation instead. Our approach is based on comparing an abstract specification in which object methods are executed atomically to a real object program. Exploiting divergence sensitivity, this also applies to progress properties such as lock-freedom. These results enable the use of \emph{polynomial-time} divergence-sensitive branching bisimulation checking techniques for verifying linearizability and progress. We conducted the experiment on concurrent lock-free stacks to validate the efficiency and effectiveness of our methods.
• ### A Multi-Axis Annotation Scheme for Event Temporal Relations(1804.07828)

April 20, 2018 cs.CL
Existing temporal relation (TempRel) annotation schemes often have low inter-annotator agreements (IAA) even between experts, suggesting that the current annotation task needs a better definition. This paper proposes a new multi-axis modeling to better capture the temporal structure of events. In addition, we identify that event end-points are a major source of confusion in annotation, so we also propose to annotate TempRels based on start-points only. A pilot expert annotation using the proposed scheme shows significant improvement in IAA from the conventional 60's to 80's (Cohen's Kappa). This better-defined annotation scheme further enables the use of crowdsourcing to alleviate the labor intensity for each annotator. We hope that this work can foster more interesting studies towards event understanding.
• ### A deep convolutional encoder-decoder neural network in assisting seismic horizon tracking(1804.06814)

April 18, 2018 physics.geo-ph
Seismic horizons are geologically significant surfaces that can be used for building geology structure and stratigraphy models. However, horizon tracking in 3D seismic data is a time-consuming and challenging problem. Relief human from the tedious seismic interpretation is one of the hot research topics. We proposed a novel automatically seismic horizon tracking method by using a deep convolutional neural network. We employ a state-of-art end-to-end semantic segmentation method to track the seismic horizons automatically. Experiment result shows that our proposed neural network can automatically track multiple horizons simultaneously. We validate the effectiveness and robustness of our proposed method by comparing automatically tracked horizons with manually picked horizons.
• ### Improving Temporal Relation Extraction with a Globally Acquired Statistical Resource(1804.06020)

April 17, 2018 cs.AI
Extracting temporal relations (before, after, overlapping, etc.) is a key aspect of understanding events described in natural language. We argue that this task would gain from the availability of a resource that provides prior knowledge in the form of the temporal order that events usually follow. This paper develops such a resource -- a probabilistic knowledge base acquired in the news domain -- by extracting temporal relations between events from the New York Times (NYT) articles over a 20-year span (1987--2007). We show that existing temporal extraction systems can be improved via this resource. As a byproduct, we also show that interesting statistics can be retrieved from this resource, which can potentially benefit other time-aware tasks. The proposed system and resource are both publicly available.
• ### MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks(1711.06798)

April 17, 2018 cs.LG, stat.ML
We present MorphNet, an approach to automate the design of neural network structures. MorphNet iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers. In contrast to previous approaches, our method is scalable to large networks, adaptable to specific resource constraints (e.g. the number of floating-point operations per inference), and capable of increasing the network's performance. When applied to standard network architectures on a wide variety of datasets, our approach discovers novel structures in each domain, obtaining higher performance while respecting the resource constraint.
• ### Hierarchical Disentangled Representations(1804.02086)

April 12, 2018 cs.LG, stat.ML
Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation, often by introducing suitable modifications of the objective function. We synthesize this growing body of literature by formulating a generalization of the evidence lower bound that explicitly represents the trade-offs between sparsity of the latent code, bijectivity of representations, and coverage of the support of the empirical data distribution. Our objective is also suitable to learning hierarchical representations that disentangle blocks of variables whilst allowing for some degree of correlations within blocks. Experiments on a range of datasets demonstrate that learned representations contain interpretable features, are able to learn discrete attributes, and generalize to unseen combinations of factors.
• ### Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors(1804.04606)

April 10, 2018 cs.CV
Modern object detectors usually suffer from low accuracy issues, as foregrounds always drown in tons of backgrounds and become hard examples during training. Compared with those proposal-based ones, real-time detectors are in far more serious trouble since they renounce the use of region-proposing stage which is used to filter a majority of backgrounds for achieving real-time rates. Though foregrounds as hard examples are in urgent need of being mined from tons of backgrounds, a considerable number of state-of-the-art real-time detectors, like YOLO series, have yet to profit from existing hard example mining methods, as using these methods need detectors fit series of prerequisites. In this paper, we propose a general hard example mining method named Loss Rank Mining (LRM) to fill the gap. LRM is a general method for real-time detectors, as it utilizes the final feature map which exists in all real-time detectors to mine hard examples. By using LRM, some elements representing easy examples in final feature map are filtered and detectors are forced to concentrate on hard examples during training. Extensive experiments validate the effectiveness of our method. With our method, the improvements of YOLOv2 detector on auto-driving related dataset KITTI and more general dataset PASCAL VOC are over 5% and 2% mAP, respectively. In addition, LRM is the first hard example mining strategy which could fit YOLOv2 perfectly and make it better applied in series of real scenarios where both real-time rates and accurate detection are strongly demanded.
• ### The freezing R\enyi quantum discord(1804.02791)

April 9, 2018 quant-ph
As a universal quantum character of quantum correlation, the freezing phenomenon is researched by geometry and quantum discord methods, respectively. In this paper, the properties of Renyi discord is studied for two independent Dimer System coupled to two correlated Fermi-spin environments under the non-Markovian condition. We further demonstrate that the freezing behaviors still exist for R`enyi discord and study the effects of different parameters on this behaviors.
• ### Betti Numbers of the HOMFLYPT Homology(1703.07257)

April 4, 2018 math.GT
In arXiv:math/0508510, Rasmussen observed that the Khovanov-Rozansky homology of a link is a finitely generated module over the polynomial ring generated by the components of this link. In the current paper, we study the module structure of the middle HOMFLYPT homology, especially the Betti numbers of this module. For each link, these Betti numbers are supported on a finite subset of $\mathbb{Z}^4$. One can easily recover from these Betti numbers the Poincar\'e polynomial of the middle HOMFLYPT homology. We explain why the Betti numbers can be viewed as a generalization of the reduced HOMFLYPT homology of knots. As an application, we prove that the projective dimension of the middle HOMFLYPT homology is additive under split union of links and provides a new obstruction to split links.
• ### Energy and Delay Optimization for Cache-Enabled Dense Small Cell Networks(1803.03780)

March 10, 2018 cs.NI
Caching popular files in small base stations (SBSs) has been proved to be an effective way to reduce bandwidth pressure on the backhaul links of dense small cell networks (DSCNs). Many existing studies on cache-enabled DSCNs attempt to improve user experience by optimizing end-to-end file delivery delay. However, under practical scenarios where files (e.g., video files) have diverse quality of service requirements, energy consumption at SBSs should also be concerned from the network perspective. In this paper,we attempt to optimize these two critical metrics in cache-enabled DSCNs. Firstly, we formulate the energy-delay optimization problem as a Mixed Integer Programming (MIP) problem, where file placement, user association and power control are jointly considered. To model the tradeoff relationship between energy consumption and end-to-end file delivery delay, a utility function linearly combining these two metrics is used as an objective function of the optimization problem. Then, we solve the problem in two stages, i.e. caching stage and delivery stage, based on the observation that caching is performed during off-peak time. At the caching stage, a local popular file placement policy is proposed by estimating user preference at each SBS. At the delivery stage, with given caching status at SBSs, the MIP problem is further decomposed by Benders' decomposition method. An efficient algorithm is proposed to approach the optimal association and power solution by iteratively shrinking the gap of the upper and lower bounds. Finally, extension simulations are performed to validate our analytical and algorithmic work. The results demonstrate that the proposed algorithms can achieve the optimal tradeoff between energy consumption and end-to-end file delivery delay.
• ### Generation of multiphoton entangled quantum states with a single silicon nanowire(1803.01641)

March 5, 2018 quant-ph
Multiphoton entanglement plays a critical role in quantum information processing, and greatly improves our fundamental understanding of the quantum world. Despite tremendous efforts in either bulk media or fiber-based devices, nonlinear interactions in integrated circuits show great promise as an excellent platform for photon pair generation with its high brightness, stability and scalability \cite{Caspani2017}. Here, we demonstrate the generation of bi- and multiphoton polarization entangled qubits in a single silicon nanowire waveguide, and these qubits directly compatible with the dense wavelength division multiplexing in telecommunication system. Multiphoton interference and quantum state tomography were used to characterize the quality of the entangled states. Four-photon entanglement states among two frequency channels were ascertained with a fidelity of $0.78\pm0.02$. Our work realizes the integrated multiphoton source in a relatively simple pattern and paves a way for the revolution of multiphoton quantum science.
• ### Enhancing radical molecular beams by skimmer cooling(1802.10179)

A high-intensity supersonic beam source has been a key component in studies of molecular collisions, molecule-surface interaction, chemical reactions, and precision spectroscopy. However, the molecular density available for experiments in a downstream science chamber is limited by skimmer clogging, which constrains the separation between a valve and a skimmer to at least several hundred nozzle diameters. A recent experiment (Science Advances, 2017, 3, e1602258) has introduced a new strategy to address this challenge: when a skimmer is cooled to a temperature below the freezing point of the carrier gas, skimmer clogging can be effectively suppressed. We go beyond this proof-of-principle work in several key ways. Firstly, we apply the skimmer cooling approach to discharge-produced radical and metastable beams entrained in a carrier gas. We also identify two different processes for skimmer clogging mitigation-shockwave suppression at temperatures around the carrier gas freezing point and diffusive clogging at even lower temperatures. With the carrier clogging removed, we now fully optimize the production of entrained species such as hydroxyl radicals, resulting in a gain of 30 in density over the best commercial devices. The gain arises from both clogging mitigation and favorable geometry with a much shorter valve-skimmer distance.
• ### Hypergeometric SLE: Conformal Markov Characterization and Applications(1703.02022)

Feb. 24, 2018 math.PR
This article pertains to the classification of pairs of simple random curves with conformal Markov property and symmetry. We give the complete classification of such curves: conformal Markov property and symmetry single out a two-parameter family of random curves---Hypergeometric SLE---denoted by hSLE$_{\kappa}(\nu)$ for $\kappa\in (0,4]$ and $\nu<\kappa-6$. The proof relies crucially on Dub\'edat's commutation relation [Dub07] and a uniqueness result proved in [MS16b]. The classification indicates that hypergeometric SLE is the only possible scaling limit of the interfaces in critical lattice models (conjectured or proved to be conformal invariant) in topological rectangles with alternating boundary conditions. We also prove various properties of hSLE: continuity, reversibility, target-independence, and conditional law characterization. As by-products, we give two applications of these properties. The first one is about the critical Ising interfaces. We prove the convergence of the Ising interface in rectangles with alternating boundary conditions. This result was first proved by Izyurov in [Izy15], but our proof is new which is based on the properties of hSLE. The second application is the existence of the so-called pure partition functions of multiple SLEs. Such existence was proved for $\kappa\in (0,8)\setminus \mathbb{Q}$ in [KP16], and it was later proved for $\kappa\in (0,4]$ in [PW17]. We give a new proof of the existence for $\kappa\in (0,6]$ using the properties of hSLE.
• ### Mixed Precision Training(1710.03740)

Feb. 15, 2018 cs.AI, cs.LG, stat.ML
Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using half precision floating point numbers. In our technique, weights, activations and gradients are stored in IEEE half-precision format. Half-precision floating numbers have limited numerical range compared to single-precision numbers. We propose two techniques to handle this loss of information. Firstly, we recommend maintaining a single-precision copy of the weights that accumulates the gradients after each optimizer step. This single-precision copy is rounded to half-precision format during training. Secondly, we propose scaling the loss appropriately to handle the loss of information with half-precision gradients. We demonstrate that this approach works for a wide variety of models including convolution neural networks, recurrent neural networks and generative adversarial networks. This technique works for large scale models with more than 100 million parameters trained on large datasets. Using this approach, we can reduce the memory consumption of deep learning models by nearly 2x. In future processors, we can also expect a significant computation speedup using half-precision hardware units.
• ### On the convergence of FK-Ising Percolation to SLE$(16/3, 16/3-6)$(1802.03939)

Feb. 12, 2018 math.PR
We give a simplified and complete proof of the convergence of the chordal exploration process in critical FK-Ising percolation to chordal SLE$_\kappa( \kappa-6)$ with $\kappa=16/3$. Our proof follows the classical excursion-construction of SLE$_\kappa(\kappa-6)$ processes in the continuum and we are thus lead to introduce suitable cut-off stopping times in order to analyse the behaviour of the driving function of the discrete system when Dobrushin boundary conditions collapse to a single point. Our proof is very different from [KS15, KS16] as it only relies on the convergence to the chordal SLE$_{\kappa}$ process in Dobrushin boundary conditions and does not require the introduction of a new observable. Still, it relies crucially on several ingredients: a) the powerful topological framework developed in [KS17] as well as its follow-up paper [CDCH$^+$14], b) the strong RSW Theorem from [CDCH16], c) the proof is inspired from the appendix A in [BC16]. One important emphasis of this paper is to carefully write down some properties which are often considered {\em folklore} in the literature but which are only justified so far by hand-waving arguments. The main examples of these are: 1) the convergence of natural discrete stopping times to their continuous analogues. (The usual hand-waving argument destroys the spatial Markov property). 2) the fact that the discrete spatial Markov property is preserved in the the scaling limit. (The enemy being that $\mathbb{E}[X_n \,|\, Y_n]$ does not necessarily converge to $\mathbb{E}[X \,|\, Y]$ when $(X_n,Y_n)\to (X,Y)$). We end the paper with a detailed sketch of the convergence to radial SLE$_\kappa( \kappa-6)$ when $\kappa=16/3$ as well as the derivation of Onsager's one-arm exponent $1/8$.
• ### DeepTravel: a Neural Network Based Travel Time Estimation Model with Auxiliary Supervision(1802.02147)

Feb. 6, 2018 cs.CV, cs.LG, stat.ML
Estimating the travel time of a path is of great importance to smart urban mobility. Existing approaches are either based on estimating the time cost of each road segment which are not able to capture many cross-segment complex factors, or designed heuristically in a non-learning-based way which fail to utilize the existing abundant temporal labels of the data, i.e., the time stamp of each trajectory point. In this paper, we leverage on new development of deep neural networks and propose a novel auxiliary supervision model, namely DeepTravel, that can automatically and effectively extract different features, as well as make full use of the temporal labels of the trajectory data. We have conducted comprehensive experiments on real datasets to demonstrate the out-performance of DeepTravel over existing approaches.
• ### Role of dimensional crossover on spin-orbit torque efficiency in magnetic insulator thin films(1708.07584)

Magnetic insulators (MIs) attract tremendous interest for spintronic applications due to low Gilbert damping and absence of Ohmic loss. Magnetic order of MIs can be manipulated and even switched by spin-orbit torques (SOTs) generated through spin Hall effect and Rashba-Edelstein effect in heavy metal/MI bilayers. SOTs on MIs are more intriguing than magnetic metals since SOTs cannot be transferred to MIs through direct injection of electron spins. Understanding of SOTs on MIs remains elusive, especially how SOTs scale with the film thickness. Here, we observe the critical role of dimensionality on the SOT efficiency by systematically studying the MI layer thickness dependent SOT efficiency in tungsten/thulium iron garnet (W/TmIG) bilayers. We first show that the TmIG thin film evolves from two-dimensional to three-dimensional magnetic phase transitions as the thickness increases, due to the suppression of long-wavelength thermal fluctuation. Then, we report the significant enhancement of the measured SOT efficiency as the thickness increases. We attribute this effect to the increase of the magnetic moment density in concert with the suppression of thermal fluctuations. At last, we demonstrate the current-induced SOT switching in the W/TmIG bilayers with a TmIG thickness up to 15 nm. The switching current density is comparable with those of heavy metal/ferromagnetic metal cases. Our findings shed light on the understanding of SOTs in MIs, which is important for the future development of ultrathin MI-based low-power spintronics.
• ### On the Uniqueness of Global Multiple SLEs(1801.07699)

Jan. 23, 2018 math-ph, math.MP, math.PR
This article focuses on the characterization of global multiple Schramm-Loewner evolutions (SLE). The chordal SLE process describes the scaling limit of a single interface in various critical lattice models with Dobrushin boundary conditions, and similarly, global multiple SLEs describe scaling limits of collections of interfaces in critical lattice models with alternating boundary conditions. In this article, we give a minimal amount of characterizing properties for the global multiple SLEs: we prove that there exists a unique probability measure on collections of pairwise disjoint continuous simple curves with a certain conditional law property. As a consequence, we obtain the convergence of multiple interfaces in the critical Ising and FK-Ising models.
• ### Polychromatic Arm Exponents for the Critical Planar FK-Ising model(1604.06639)

Jan. 13, 2018 math-ph, math.MP, math.PR
Schramm Loewner Evolution (SLE) is a one-parameter family of random planar curves introduced by Oded Schramm in 1999 as the candidates for the scaling limits of the interfaces in the planar critical lattice models. This is the only possible process with conformal invariance and a certain "domain Markov property". In 2010, Chelkak and Smirnov proved the conformal invariance of the scaling limits of the critial planar FK-Ising model which gave the convergence of the interface to SLE$_{16/3}$. We derive the arm exponents of SLE$_{\kappa}$ for $\kappa\in (4,8)$. Combining with the convergence of the interface, we derive the arm exponents of the critical FK-Ising model. We obtain six different patterns of boundary arm exponents and three different patterns of interior arm exponents of the critical planar FK-Ising model on the square lattice.