• For distributed computing environment, we consider the empirical risk minimization problem and propose a distributed and communication-efficient Newton-type optimization method. At every iteration, each worker locally finds an Approximate NewTon (ANT) direction, which is sent to the main driver. The main driver, then, averages all the ANT directions received from workers to form a {\it Globally Improved ANT} (GIANT) direction. GIANT is highly communication efficient and naturally exploits the trade-offs between local computations and global communications in that more local computations result in fewer overall rounds of communications. Theoretically, we show that GIANT enjoys an improved convergence rate as compared with first-order methods and existing distributed Newton-type methods. Further, and in sharp contrast with many existing distributed Newton-type methods, as well as popular first-order methods, a highly advantageous practical feature of GIANT is that it only involves one tuning parameter. We conduct large-scale experiments on a computer cluster and, empirically, demonstrate the superior performance of GIANT.
  • Let $F$ be a non-archimedean local field of odd residue characteristic $p$. Let $G$ be the unramified unitary group $U(2, 1)(E/F)$ in three variables, and $K$ be a maximal compact open subgroup of $G$. For an irreducible smooth representation $\sigma$ of $K$ over $\overline{\mathbf{F}}_p$, we prove that the compactly induced representation $\text{ind}^G _K \sigma$ is free of infinite rank over the spherical Hecke algebra $\mathcal{H}(K, \sigma)$.
  • Canonical correlation analysis (CCA) is a state-of-the-art method for frequency recognition in steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI) systems. Various extended methods have been developed, and among such methods, a combination method of CCA and individual-template-based CCA (IT-CCA) has achieved excellent performance. However, CCA requires the canonical vectors to be orthogonal, which may not be a reasonable assumption for EEG analysis. In the current study, we propose using the correlated component analysis (CORRCA) rather than CCA to implement frequency recognition. CORRCA can relax the constraint of canonical vectors in CCA, and generate the same projection vector for two multichannel EEG signals. Furthermore, we propose a two-stage method based on the basic CORRCA method (termed TSCORRCA). Evaluated on a benchmark dataset of thirty-five subjects, the experimental results demonstrate that CORRCA significantly outperformed CCA, and TSCORRCA obtained the best performance among the compared methods. This study demonstrates that CORRCA-based methods have great potential for implementing high-performance SSVEP-based BCI systems.
  • This paper describes our system that has been submitted to SemEval-2018 Task 1: Affect in Tweets (AIT) to solve five subtasks. We focus on modeling both sentence and word level representations of emotion inside texts through large distantly labeled corpora with emojis and hashtags. We transfer the emotional knowledge by exploiting neural network models as feature extractors and use these representations for traditional machine learning models such as support vector regression (SVR) and logistic regression to solve the competition tasks. Our system is placed among the Top3 for all subtasks we participated.
  • We present topographic and spectroscopic scanning tunneling microscopy measurements taken on a 21 nm thick TiN film at a temperature of 4.2 K -- above the superconducting transition temperature (T_c = 3.8 K) of the sample. The film was polycrystalline with crystallite diameters of d~19 nm, consistent with other films prepared under similar conditions. The spectroscopic maps show on average a shallow V-shape around V_b = 0 V consistent with a sample near the Mott insulation transition. In selected regions on several samples we additionally observed signs of Coulomb blockade. The corresponding peak structures are typically asymmetric with respect to bias voltage indicating coupling to two very different tunneling barriers. Furthermore, the peak structures appear with constant peak-peak spacing which indicates quantum dot states within the Coulomb blockade island. In this paper we discuss one such Coulomb blockade area and its implications in detail.
  • The task of Fine-grained Entity Type Classification (FETC) consists of assigning types from a hierarchy to entity mentions in text. Existing methods rely on distant supervision and are thus susceptible to noisy labels that can be out-of-context or overly-specific for the training sentence. Previous methods that attempt to address these issues do so with heuristics or with the help of hand-crafted features. Instead, we propose an end-to-end solution with a neural network model that uses a variant of cross- entropy loss function to handle out-of-context labels, and hierarchical loss normalization to cope with overly-specific ones. Also, previous work solve FETC a multi-label classification followed by ad-hoc post-processing. In contrast, our solution is more elegant: we use public word embeddings to train a single-label that jointly learns representations for entity mentions and their context. We show experimentally that our approach is robust against noise and consistently outperforms the state-of-the-art on established benchmarks for the task.
  • We propose a deep hashing framework for sketch retrieval that, for the first time, works on a multi-million scale human sketch dataset. Leveraging on this large dataset, we explore a few sketch-specific traits that were otherwise under-studied in prior literature. Instead of following the conventional sketch recognition task, we introduce the novel problem of sketch hashing retrieval which is not only more challenging, but also offers a better testbed for large-scale sketch analysis, since: (i) more fine-grained sketch feature learning is required to accommodate the large variations in style and abstraction, and (ii) a compact binary code needs to be learned at the same time to enable efficient retrieval. Key to our network design is the embedding of unique characteristics of human sketch, where (i) a two-branch CNN-RNN architecture is adapted to explore the temporal ordering of strokes, and (ii) a novel hashing loss is specifically designed to accommodate both the temporal and abstract traits of sketches. By working with a 3.8M sketch dataset, we show that state-of-the-art hashing models specifically engineered for static images fail to perform well on temporal sketch data. Our network on the other hand not only offers the best retrieval performance on various code sizes, but also yields the best generalization performance under a zero-shot setting and when re-purposed for sketch recognition. Such superior performances effectively demonstrate the benefit of our sketch-specific design.
  • Complex oxide interfaces are a promising platform for studying a wide array of correlated electron phenomena in low-dimensions, including magnetism and superconductivity. The microscopic origin of these phenomena in complex oxide interfaces remains an open question. Here we investigate for the first time the magnetic properties of semi-insulating NdTiO$_3$/SrTiO$_3$ (NTO/STO) interfaces and present the first milli-Kelvin study of NTO/STO. The magnetoresistance (MR) reveals signatures of local ferromagnetic order and of spin-dependent thermally-activated transport, which are described quantitatively by a simple phenomenological model. We discuss possible origins of the interfacial ferromagnetism. In addition, the MR also shows transient hysteretic features on a timescale of ~10-100 seconds. We demonstrate that these are consistent with an extrinsic magneto-thermal origin, which may have been misinterpreted in previous reports of magnetism in STO-based oxide interfaces. The existence of these two MR regimes (steady-state and transient) highlights the importance of time-dependent measurements for distinguishing signatures of ferromagnetism from other effects that can produce hysteresis at low temperatures.
  • Let $G$ be the unramified unitary group $U(2, 1)(E/F)$ over a non-archimedean local field $F$ of odd residue characteristic $p$. In this paper, for any admissible supersingular representation of $G$ that contains the Steinberg weight, we prove its pro-$p$-Iwahori invariants, as a right module over the pro-$p$-Iwahori--Hecke algebra of $G$, is \emph{not} simple.
  • Our mysterious brain is believed to operate near a non-equilibrium point and generate critical self-organized avalanches in neuronal activity. Recent experimental evidence has revealed significant heterogeneity in both synaptic input and output connectivity, but whether the structural heterogeneity participates in the regulation of neuronal avalanches remains poorly understood. By computational modelling, we predict that different types of structural heterogeneity contribute distinct effects on avalanche neurodynamics. In particular, neuronal avalanches can be triggered at an intermediate level of input heterogeneity, but heterogeneous output connectivity cannot evoke avalanche dynamics. In the criticality region, the co-emergence of multi-scale cortical activities is observed, and both the avalanche dynamics and neuronal oscillations are modulated by the input heterogeneity. Remarkably, we show similar results can be reproduced in networks with various types of in- and out-degree distributions. Overall, these findings not only provide details on the underlying circuitry mechanisms of nonrandom synaptic connectivity in the regulation of neuronal avalanches, but also inspire testable hypotheses for future experimental studies.
  • Interplays between quantum physics and gravity has long inspired exciting studies, which also reveals subtle connections between quantum laws and the general notion of curved spacetime. One important example is the uniqueness of free-falling motions in both quantum and gravitational physics. In this work, we study, from a different perspective, the free motions of quantum test wave packets that distributed over weakly curved spacetime backgrounds. Except for the de Broglie relations, no assumption of priori given Hamiltonians or least actions satisfied by the quantum system is made. We find that the mean motions of quantum test wave packets can be deduced naturally from the de Broglie relations with a generalized treatment of gravitational time dilations in the quantum waves. Such mean motions of quantum test systems are independent of their masses and compositions, and restores exactly the free-falling or geodesic motions of classical test masses in curved spacetime. This suggests a novel perspective that weak equivalence principle, which states the universality of free-fall and serves as the foundations of gravitational theories, may be deeply rooted in quantum physics and be a phenomena emergent from the quantum world.
  • Let $F$ be a non-archimedean local field of odd residue characteristic $p$. Let $G$ be the unramified unitary group $U(2, 1)(E/F)$, and $K$ be a maximal compact open subgroup of $G$. For an $\overline{\mathbf{F}}_p$-smooth representation $\pi$ of $G$ containing a weight $\sigma$ of $K$, we follow the work of Hu (\cite{Hu12}) to attach $\pi$ a certain $I_K$-subrepresentation, where $I_K$ is the Iwahori subgroup in $K$. In terms of such an $I_K$-subrepresentation, we prove a sufficient condition for $\pi$ to be non-finitely presented. We determine such an $I_K$-subrepresentation explicitly, when $\pi$ is either a spherical universal Hecke module or an irreducible principal series.
  • Let $E/F$ be a unramified quadratic extension of non-archimedean local fields of odd characteristic $p$, and $G$ be the unramified unitary group $U(2, 1)(E/F)$. For an irreducible smooth representation $\pi$ of $G$ over $\overline{\mathbf{F}}_p$, with an underlying irreducible smooth representation $\sigma$ of a maximal compact open subgroup $K$, we prove that $\pi$ admits eigenvectors for an appropriate Hecke operator $T_\sigma$, and we classify those $\pi$ with non-zero eigenvalues for $T_\sigma$ by a tree argument; as a corollary, we show $\pi$ is supersingular if and only if it is supercuspidal.
  • For solving large-scale non-convex problems, we propose inexact variants of trust region and adaptive cubic regularization methods, which, to increase efficiency, incorporate various approximations. In particular, in addition to approximate sub-problem solves, both the Hessian and the gradient are suitably approximated. Using rather mild conditions on such approximations, we show that our proposed inexact methods achieve similar optimal worst-case iteration complexities as the exact counterparts. Our proposed algorithms, and their respective theoretical analysis, do not require knowledge of any unknowable problem-related quantities, and hence are easily implementable in practice. In the context of finite-sum problems, we then explore randomized sub-sampling methods as ways to construct the gradient and Hessian approximations and examine the empirical performance of our algorithms on some real datasets.
  • While first-order optimization methods such as stochastic gradient descent (SGD) are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points. These issues are particularly acute in highly non-convex settings such as those arising in neural networks. Motivated by this, there has been recent interest in second-order methods that aim to alleviate these shortcomings by capturing curvature information. In this paper, we report detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems. In doing so, we demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Further, in contrast to SGD with momentum, we show that the manner in which these Newton-type methods employ curvature information allows them to seamlessly escape flat regions and saddle points.
  • We consider variants of trust-region and cubic regularization methods for non-convex optimization, in which the Hessian matrix is approximated. Under mild conditions on the inexact Hessian, and using approximate solution of the corresponding sub-problems, we provide iteration complexity to achieve $ \epsilon $-approximate second-order optimality which have shown to be tight. Our Hessian approximation conditions constitute a major relaxation over the existing ones in the literature. Consequently, we are able to show that such mild conditions allow for the construction of the approximate Hessian through various random sampling methods. In this light, we consider the canonical problem of finite-sum minimization, provide appropriate uniform and non-uniform sub-sampling strategies to construct such Hessian approximations, and obtain optimal iteration complexity for the corresponding sub-sampled trust-region and cubic regularization methods.
  • We report an evaluation of the effectiveness of the existing knowledge base embedding models for relation prediction and for relation extraction on a wide range of benchmarks. We also describe a new benchmark, which is much larger and complex than previous ones, which we introduce to help validate the effectiveness of both tasks. The results demonstrate that knowledge base embedding models are generally effective for relation prediction but unable to give improvements for the state-of-art neural relation extraction model with the existing strategies, while pointing limitations of existing methods.
  • We propose a Bell measurement free scheme to implement a quantum repeater in GaAs/AlGa double qunatum dot systems.we prove the four pairs of double quantum dots compose an entanglement unit, given the the initial state is singlet states. Our shceme differs from the famous Duan-Lukin-Cirac-zoller(DLCZ) protocol in that Bell measurements are unneccessary for the entanglement swapping,which provides great advantages and conveniences in experimental implementaion. Our scheme significantly improve the success probability of quantum repeaters based on solid state quantum devices.
  • A challenge in training discriminative models like neural networks is obtaining enough labeled training data. Recent approaches use generative models to combine weak supervision sources, like user-defined heuristics or knowledge bases, to label training data. Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set. In particular, they fail to model latent subsets in the training data in which the supervision sources perform differently than on average. We present Socratic learning, a paradigm that uses feedback from a corresponding discriminative model to automatically identify these subsets and augments the structure of the generative model accordingly. Experimentally, we show that without any ground truth labels, the augmented generative model reduces error by up to 56.06% for a relation extraction task compared to a state-of-the-art weak supervision technique that utilizes generative models.
  • Biological neurons receive multiple noisy oscillatory signals, and their dynamical response to the superposition of these signals is of fundamental importance for information processing in the brain. Here we study the response of neural systems to the weak envelope modulation signal, which is superimposed by two periodic signals with different frequencies. We show that stochastic resonance occurs at the beat frequency in neural systems at the single-neuron as well as the population level. The performance of this frequency-difference-dependent stochastic resonance is influenced by both the beat frequency and the two forcing frequencies. Compared to a single neuron, a population of neurons is more efficient in detecting the information carried by the weak envelope modulation signal at the beat frequency. Furthermore, an appropriate fine-tuning of the excitation-inhibition balance can further optimize the response of a neural ensemble to the superimposed signal. Our results thus introduce and provide insights into the generation and modulation mechanism of the frequency-difference-dependent stochastic resonance in neural systems.
  • This paper investigates the optimal power allocation scheme for sum throughput maximization of non-orthogonal multiple access (NOMA) system with $\alpha$-fairness. In contrast to the existing fairness NOMA models, $\alpha$-fairness can only utilize a single scalar to achieve different user fairness levels. Two different channel state information at the transmitter (CSIT) assumptions are considered, namely, statistical and perfect CSIT. For statistical CSIT, fixed target data rates are predefined, and the power allocation problem is solved for sum throughput maximization with $\alpha$-fairness, through characterizing several properties of the optimal power allocation solution. For perfect CSIT, the optimal power allocation is determined to maximize the instantaneous sum rate with $\alpha$-fairness, where user rates are adapted according to the instantaneous channel state information (CSI). In particular, a simple alternate optimization (AO) algorithm is proposed, which is demonstrated to yield the optimal solution. Numerical results reveal that, at the same fairness level, NOMA significantly outperforms the conventional orthogonal multiple access (MA) for both the scenarios with statistical and perfect CSIT.
  • Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, requires $\mathcal O(1/\Delta)$ full-data passes to recover the principal component of a matrix with eigen-gap $\Delta$. Lanczos, a significantly more complex method, achieves an accelerated rate of $\mathcal O(1/\sqrt{\Delta})$ passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja's iteration achieve the optimal sample complexity $\mathcal O(\sigma^2/\Delta^2)$. Unfortunately, they are fully sequential, and also require $\mathcal O(\sigma^2/\Delta^2)$ iterations, far from the $\mathcal O(1/\sqrt{\Delta})$ rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, $\mathcal O(1/\sqrt{\Delta})$. We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the "breaking-point variance" beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity $\mathcal O(1/\sqrt{\Delta})$. Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.
  • We analyze a measurement scheme that allows determination of the Berry curvature and the topological Chern number of a Hamiltonian with parameters exploring a two-dimensional closed manifold. Our method uses continuous monitoring of the gradient of the Hamiltonian with respect to one parameter during a quasi-adiabatic quench of the other. Measurement back-action leads to disturbance of the system dynamics, but we show that this can be compensated by a feedback Hamiltonian. As an example, we analyze the implementation with a superconducting qubit subject to time varying, near resonant microwave fields; equivalent to a spin 1/2 particle in a magnetic field.
  • The Doppler tracking data of the Chang'e 3 lunar mission is used to constrain the stochastic background of gravitational wave in cosmology within the 1 mHz to 0.05 Hz frequency band. Our result improves on the upper bound on the energy density of the stochastic background of gravitational wave in the 0.02 Hz to 0.05 Hz band obtained by the Apollo missions, with the improvement reaching almost one order of magnitude at around 0.05 Hz. Detailed noise analysis of the Doppler tracking data is also presented, with the prospect that these noise sources will be mitigated in future Chinese deep space missions. A feasibility study is also undertaken to understand the scientific capability of the Chang'e 4 mission, due to be launched in 2018, in relation to the stochastic gravitational wave background around 0.01 Hz. The study indicates that the upper bound on the energy density may be further improved by another order of magnitude from the Chang'e 3 mission, which will fill the gap in the frequency band from 0.02 Hz to 0.1 Hz in the foreseeable future.
  • We propose an efficient stepwise adiabatic merging (SAM) method to generate many-body singlet states in antiferromagnetic spin-1 bosons in concatenated optical superlattices with isolated double-well arrays, by adiabatically ramping up the double-well bias. With an appropriate choice of bias sweeping rate and magnetic field, the SAM protocol predicts a fidelity as high as 90% for a sixteen-body singlet state and even higher fidelities for smaller even-body singlet states. During their evolution, the spin-1 bosons exhibit interesting squeezing dynamics, manifested by an odd-even oscillation of the experimentally observable squeezing parameter. The generated many-body singlet states may find practical applications in precision measurement of magnetic field gradient and in quantum information processing.