-
For distributed computing environment, we consider the empirical risk
minimization problem and propose a distributed and communication-efficient
Newton-type optimization method. At every iteration, each worker locally finds
an Approximate NewTon (ANT) direction, which is sent to the main driver. The
main driver, then, averages all the ANT directions received from workers to
form a {\it Globally Improved ANT} (GIANT) direction. GIANT is highly
communication efficient and naturally exploits the trade-offs between local
computations and global communications in that more local computations result
in fewer overall rounds of communications. Theoretically, we show that GIANT
enjoys an improved convergence rate as compared with first-order methods and
existing distributed Newton-type methods. Further, and in sharp contrast with
many existing distributed Newton-type methods, as well as popular first-order
methods, a highly advantageous practical feature of GIANT is that it only
involves one tuning parameter. We conduct large-scale experiments on a computer
cluster and, empirically, demonstrate the superior performance of GIANT.
-
Let $F$ be a non-archimedean local field of odd residue characteristic $p$.
Let $G$ be the unramified unitary group $U(2, 1)(E/F)$ in three variables, and
$K$ be a maximal compact open subgroup of $G$. For an irreducible smooth
representation $\sigma$ of $K$ over $\overline{\mathbf{F}}_p$, we prove that
the compactly induced representation $\text{ind}^G _K \sigma$ is free of
infinite rank over the spherical Hecke algebra $\mathcal{H}(K, \sigma)$.
-
Canonical correlation analysis (CCA) is a state-of-the-art method for
frequency recognition in steady-state visual evoked potential (SSVEP)-based
brain-computer interface (BCI) systems. Various extended methods have been
developed, and among such methods, a combination method of CCA and
individual-template-based CCA (IT-CCA) has achieved excellent performance.
However, CCA requires the canonical vectors to be orthogonal, which may not be
a reasonable assumption for EEG analysis. In the current study, we propose
using the correlated component analysis (CORRCA) rather than CCA to implement
frequency recognition. CORRCA can relax the constraint of canonical vectors in
CCA, and generate the same projection vector for two multichannel EEG signals.
Furthermore, we propose a two-stage method based on the basic CORRCA method
(termed TSCORRCA). Evaluated on a benchmark dataset of thirty-five subjects,
the experimental results demonstrate that CORRCA significantly outperformed
CCA, and TSCORRCA obtained the best performance among the compared methods.
This study demonstrates that CORRCA-based methods have great potential for
implementing high-performance SSVEP-based BCI systems.
-
This paper describes our system that has been submitted to SemEval-2018 Task
1: Affect in Tweets (AIT) to solve five subtasks. We focus on modeling both
sentence and word level representations of emotion inside texts through large
distantly labeled corpora with emojis and hashtags. We transfer the emotional
knowledge by exploiting neural network models as feature extractors and use
these representations for traditional machine learning models such as support
vector regression (SVR) and logistic regression to solve the competition tasks.
Our system is placed among the Top3 for all subtasks we participated.
-
We present topographic and spectroscopic scanning tunneling microscopy
measurements taken on a 21 nm thick TiN film at a temperature of 4.2 K -- above
the superconducting transition temperature (T_c = 3.8 K) of the sample. The
film was polycrystalline with crystallite diameters of d~19 nm, consistent with
other films prepared under similar conditions. The spectroscopic maps show on
average a shallow V-shape around V_b = 0 V consistent with a sample near the
Mott insulation transition. In selected regions on several samples we
additionally observed signs of Coulomb blockade. The corresponding peak
structures are typically asymmetric with respect to bias voltage indicating
coupling to two very different tunneling barriers. Furthermore, the peak
structures appear with constant peak-peak spacing which indicates quantum dot
states within the Coulomb blockade island. In this paper we discuss one such
Coulomb blockade area and its implications in detail.
-
The task of Fine-grained Entity Type Classification (FETC) consists of
assigning types from a hierarchy to entity mentions in text. Existing methods
rely on distant supervision and are thus susceptible to noisy labels that can
be out-of-context or overly-specific for the training sentence. Previous
methods that attempt to address these issues do so with heuristics or with the
help of hand-crafted features. Instead, we propose an end-to-end solution with
a neural network model that uses a variant of cross- entropy loss function to
handle out-of-context labels, and hierarchical loss normalization to cope with
overly-specific ones. Also, previous work solve FETC a multi-label
classification followed by ad-hoc post-processing. In contrast, our solution is
more elegant: we use public word embeddings to train a single-label that
jointly learns representations for entity mentions and their context. We show
experimentally that our approach is robust against noise and consistently
outperforms the state-of-the-art on established benchmarks for the task.
-
We propose a deep hashing framework for sketch retrieval that, for the first
time, works on a multi-million scale human sketch dataset. Leveraging on this
large dataset, we explore a few sketch-specific traits that were otherwise
under-studied in prior literature. Instead of following the conventional sketch
recognition task, we introduce the novel problem of sketch hashing retrieval
which is not only more challenging, but also offers a better testbed for
large-scale sketch analysis, since: (i) more fine-grained sketch feature
learning is required to accommodate the large variations in style and
abstraction, and (ii) a compact binary code needs to be learned at the same
time to enable efficient retrieval. Key to our network design is the embedding
of unique characteristics of human sketch, where (i) a two-branch CNN-RNN
architecture is adapted to explore the temporal ordering of strokes, and (ii) a
novel hashing loss is specifically designed to accommodate both the temporal
and abstract traits of sketches. By working with a 3.8M sketch dataset, we show
that state-of-the-art hashing models specifically engineered for static images
fail to perform well on temporal sketch data. Our network on the other hand not
only offers the best retrieval performance on various code sizes, but also
yields the best generalization performance under a zero-shot setting and when
re-purposed for sketch recognition. Such superior performances effectively
demonstrate the benefit of our sketch-specific design.
-
Complex oxide interfaces are a promising platform for studying a wide array
of correlated electron phenomena in low-dimensions, including magnetism and
superconductivity. The microscopic origin of these phenomena in complex oxide
interfaces remains an open question. Here we investigate for the first time the
magnetic properties of semi-insulating NdTiO$_3$/SrTiO$_3$ (NTO/STO) interfaces
and present the first milli-Kelvin study of NTO/STO. The magnetoresistance (MR)
reveals signatures of local ferromagnetic order and of spin-dependent
thermally-activated transport, which are described quantitatively by a simple
phenomenological model. We discuss possible origins of the interfacial
ferromagnetism. In addition, the MR also shows transient hysteretic features on
a timescale of ~10-100 seconds. We demonstrate that these are consistent with
an extrinsic magneto-thermal origin, which may have been misinterpreted in
previous reports of magnetism in STO-based oxide interfaces. The existence of
these two MR regimes (steady-state and transient) highlights the importance of
time-dependent measurements for distinguishing signatures of ferromagnetism
from other effects that can produce hysteresis at low temperatures.
-
Let $G$ be the unramified unitary group $U(2, 1)(E/F)$ over a non-archimedean
local field $F$ of odd residue characteristic $p$. In this paper, for any
admissible supersingular representation of $G$ that contains the Steinberg
weight, we prove its pro-$p$-Iwahori invariants, as a right module over the
pro-$p$-Iwahori--Hecke algebra of $G$, is \emph{not} simple.
-
Our mysterious brain is believed to operate near a non-equilibrium point and
generate critical self-organized avalanches in neuronal activity. Recent
experimental evidence has revealed significant heterogeneity in both synaptic
input and output connectivity, but whether the structural heterogeneity
participates in the regulation of neuronal avalanches remains poorly
understood. By computational modelling, we predict that different types of
structural heterogeneity contribute distinct effects on avalanche
neurodynamics. In particular, neuronal avalanches can be triggered at an
intermediate level of input heterogeneity, but heterogeneous output
connectivity cannot evoke avalanche dynamics. In the criticality region, the
co-emergence of multi-scale cortical activities is observed, and both the
avalanche dynamics and neuronal oscillations are modulated by the input
heterogeneity. Remarkably, we show similar results can be reproduced in
networks with various types of in- and out-degree distributions. Overall, these
findings not only provide details on the underlying circuitry mechanisms of
nonrandom synaptic connectivity in the regulation of neuronal avalanches, but
also inspire testable hypotheses for future experimental studies.
-
Interplays between quantum physics and gravity has long inspired exciting
studies, which also reveals subtle connections between quantum laws and the
general notion of curved spacetime. One important example is the uniqueness of
free-falling motions in both quantum and gravitational physics. In this work,
we study, from a different perspective, the free motions of quantum test wave
packets that distributed over weakly curved spacetime backgrounds. Except for
the de Broglie relations, no assumption of priori given Hamiltonians or least
actions satisfied by the quantum system is made. We find that the mean motions
of quantum test wave packets can be deduced naturally from the de Broglie
relations with a generalized treatment of gravitational time dilations in the
quantum waves. Such mean motions of quantum test systems are independent of
their masses and compositions, and restores exactly the free-falling or
geodesic motions of classical test masses in curved spacetime. This suggests a
novel perspective that weak equivalence principle, which states the
universality of free-fall and serves as the foundations of gravitational
theories, may be deeply rooted in quantum physics and be a phenomena emergent
from the quantum world.
-
Let $F$ be a non-archimedean local field of odd residue characteristic $p$.
Let $G$ be the unramified unitary group $U(2, 1)(E/F)$, and $K$ be a maximal
compact open subgroup of $G$. For an $\overline{\mathbf{F}}_p$-smooth
representation $\pi$ of $G$ containing a weight $\sigma$ of $K$, we follow the
work of Hu (\cite{Hu12}) to attach $\pi$ a certain $I_K$-subrepresentation,
where $I_K$ is the Iwahori subgroup in $K$. In terms of such an
$I_K$-subrepresentation, we prove a sufficient condition for $\pi$ to be
non-finitely presented. We determine such an $I_K$-subrepresentation
explicitly, when $\pi$ is either a spherical universal Hecke module or an
irreducible principal series.
-
Let $E/F$ be a unramified quadratic extension of non-archimedean local fields
of odd characteristic $p$, and $G$ be the unramified unitary group $U(2,
1)(E/F)$. For an irreducible smooth representation $\pi$ of $G$ over
$\overline{\mathbf{F}}_p$, with an underlying irreducible smooth representation
$\sigma$ of a maximal compact open subgroup $K$, we prove that $\pi$ admits
eigenvectors for an appropriate Hecke operator $T_\sigma$, and we classify
those $\pi$ with non-zero eigenvalues for $T_\sigma$ by a tree argument; as a
corollary, we show $\pi$ is supersingular if and only if it is supercuspidal.
-
For solving large-scale non-convex problems, we propose inexact variants of
trust region and adaptive cubic regularization methods, which, to increase
efficiency, incorporate various approximations. In particular, in addition to
approximate sub-problem solves, both the Hessian and the gradient are suitably
approximated. Using rather mild conditions on such approximations, we show that
our proposed inexact methods achieve similar optimal worst-case iteration
complexities as the exact counterparts. Our proposed algorithms, and their
respective theoretical analysis, do not require knowledge of any unknowable
problem-related quantities, and hence are easily implementable in practice. In
the context of finite-sum problems, we then explore randomized sub-sampling
methods as ways to construct the gradient and Hessian approximations and
examine the empirical performance of our algorithms on some real datasets.
-
While first-order optimization methods such as stochastic gradient descent
(SGD) are popular in machine learning (ML), they come with well-known
deficiencies, including relatively-slow convergence, sensitivity to the
settings of hyper-parameters such as learning rate, stagnation at high training
errors, and difficulty in escaping flat regions and saddle points. These issues
are particularly acute in highly non-convex settings such as those arising in
neural networks. Motivated by this, there has been recent interest in
second-order methods that aim to alleviate these shortcomings by capturing
curvature information. In this paper, we report detailed empirical evaluations
of a class of Newton-type methods, namely sub-sampled variants of trust region
(TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex
ML problems. In doing so, we demonstrate that these methods not only can be
computationally competitive with hand-tuned SGD with momentum, obtaining
comparable or better generalization performance, but also they are highly
robust to hyper-parameter settings. Further, in contrast to SGD with momentum,
we show that the manner in which these Newton-type methods employ curvature
information allows them to seamlessly escape flat regions and saddle points.
-
We consider variants of trust-region and cubic regularization methods for
non-convex optimization, in which the Hessian matrix is approximated. Under
mild conditions on the inexact Hessian, and using approximate solution of the
corresponding sub-problems, we provide iteration complexity to achieve $
\epsilon $-approximate second-order optimality which have shown to be tight.
Our Hessian approximation conditions constitute a major relaxation over the
existing ones in the literature. Consequently, we are able to show that such
mild conditions allow for the construction of the approximate Hessian through
various random sampling methods. In this light, we consider the canonical
problem of finite-sum minimization, provide appropriate uniform and non-uniform
sub-sampling strategies to construct such Hessian approximations, and obtain
optimal iteration complexity for the corresponding sub-sampled trust-region and
cubic regularization methods.
-
We report an evaluation of the effectiveness of the existing knowledge base
embedding models for relation prediction and for relation extraction on a wide
range of benchmarks. We also describe a new benchmark, which is much larger and
complex than previous ones, which we introduce to help validate the
effectiveness of both tasks. The results demonstrate that knowledge base
embedding models are generally effective for relation prediction but unable to
give improvements for the state-of-art neural relation extraction model with
the existing strategies, while pointing limitations of existing methods.
-
We propose a Bell measurement free scheme to implement a quantum repeater in
GaAs/AlGa double qunatum dot systems.we prove the four pairs of double quantum
dots compose an entanglement unit, given the the initial state is singlet
states. Our shceme differs from the famous Duan-Lukin-Cirac-zoller(DLCZ)
protocol in that Bell measurements are unneccessary for the entanglement
swapping,which provides great advantages and conveniences in experimental
implementaion. Our scheme significantly improve the success probability of
quantum repeaters based on solid state quantum devices.
-
A challenge in training discriminative models like neural networks is
obtaining enough labeled training data. Recent approaches use generative models
to combine weak supervision sources, like user-defined heuristics or knowledge
bases, to label training data. Prior work has explored learning accuracies for
these sources even without ground truth labels, but they assume that a single
accuracy parameter is sufficient to model the behavior of these sources over
the entire training set. In particular, they fail to model latent subsets in
the training data in which the supervision sources perform differently than on
average. We present Socratic learning, a paradigm that uses feedback from a
corresponding discriminative model to automatically identify these subsets and
augments the structure of the generative model accordingly. Experimentally, we
show that without any ground truth labels, the augmented generative model
reduces error by up to 56.06% for a relation extraction task compared to a
state-of-the-art weak supervision technique that utilizes generative models.
-
Biological neurons receive multiple noisy oscillatory signals, and their
dynamical response to the superposition of these signals is of fundamental
importance for information processing in the brain. Here we study the response
of neural systems to the weak envelope modulation signal, which is superimposed
by two periodic signals with different frequencies. We show that stochastic
resonance occurs at the beat frequency in neural systems at the single-neuron
as well as the population level. The performance of this
frequency-difference-dependent stochastic resonance is influenced by both the
beat frequency and the two forcing frequencies. Compared to a single neuron, a
population of neurons is more efficient in detecting the information carried by
the weak envelope modulation signal at the beat frequency. Furthermore, an
appropriate fine-tuning of the excitation-inhibition balance can further
optimize the response of a neural ensemble to the superimposed signal. Our
results thus introduce and provide insights into the generation and modulation
mechanism of the frequency-difference-dependent stochastic resonance in neural
systems.
-
This paper investigates the optimal power allocation scheme for sum
throughput maximization of non-orthogonal multiple access (NOMA) system with
$\alpha$-fairness. In contrast to the existing fairness NOMA models,
$\alpha$-fairness can only utilize a single scalar to achieve different user
fairness levels. Two different channel state information at the transmitter
(CSIT) assumptions are considered, namely, statistical and perfect CSIT. For
statistical CSIT, fixed target data rates are predefined, and the power
allocation problem is solved for sum throughput maximization with
$\alpha$-fairness, through characterizing several properties of the optimal
power allocation solution. For perfect CSIT, the optimal power allocation is
determined to maximize the instantaneous sum rate with $\alpha$-fairness, where
user rates are adapted according to the instantaneous channel state information
(CSI). In particular, a simple alternate optimization (AO) algorithm is
proposed, which is demonstrated to yield the optimal solution. Numerical
results reveal that, at the same fairness level, NOMA significantly outperforms
the conventional orthogonal multiple access (MA) for both the scenarios with
statistical and perfect CSIT.
-
Principal component analysis (PCA) is one of the most powerful tools in
machine learning. The simplest method for PCA, the power iteration, requires
$\mathcal O(1/\Delta)$ full-data passes to recover the principal component of a
matrix with eigen-gap $\Delta$. Lanczos, a significantly more complex method,
achieves an accelerated rate of $\mathcal O(1/\sqrt{\Delta})$ passes. Modern
applications, however, motivate methods that only ingest a subset of available
data, known as the stochastic setting. In the online stochastic setting, simple
algorithms like Oja's iteration achieve the optimal sample complexity $\mathcal
O(\sigma^2/\Delta^2)$. Unfortunately, they are fully sequential, and also
require $\mathcal O(\sigma^2/\Delta^2)$ iterations, far from the $\mathcal
O(1/\sqrt{\Delta})$ rate of Lanczos. We propose a simple variant of the power
iteration with an added momentum term, that achieves both the optimal sample
and iteration complexity. In the full-pass setting, standard analysis shows
that momentum achieves the accelerated rate, $\mathcal O(1/\sqrt{\Delta})$. We
demonstrate empirically that naively applying momentum to a stochastic method,
does not result in acceleration. We perform a novel, tight variance analysis
that reveals the "breaking-point variance" beyond which this acceleration does
not occur. By combining this insight with modern variance reduction techniques,
we construct stochastic PCA algorithms, for the online and offline setting,
that achieve an accelerated iteration complexity $\mathcal O(1/\sqrt{\Delta})$.
Due to the embarassingly parallel nature of our methods, this acceleration
translates directly to wall-clock time if deployed in a parallel environment.
Our approach is very general, and applies to many non-convex optimization
problems that can now be accelerated using the same technique.
-
We analyze a measurement scheme that allows determination of the Berry
curvature and the topological Chern number of a Hamiltonian with parameters
exploring a two-dimensional closed manifold. Our method uses continuous
monitoring of the gradient of the Hamiltonian with respect to one parameter
during a quasi-adiabatic quench of the other. Measurement back-action leads to
disturbance of the system dynamics, but we show that this can be compensated by
a feedback Hamiltonian. As an example, we analyze the implementation with a
superconducting qubit subject to time varying, near resonant microwave fields;
equivalent to a spin 1/2 particle in a magnetic field.
-
The Doppler tracking data of the Chang'e 3 lunar mission is used to constrain
the stochastic background of gravitational wave in cosmology within the 1 mHz
to 0.05 Hz frequency band. Our result improves on the upper bound on the energy
density of the stochastic background of gravitational wave in the 0.02 Hz to
0.05 Hz band obtained by the Apollo missions, with the improvement reaching
almost one order of magnitude at around 0.05 Hz. Detailed noise analysis of the
Doppler tracking data is also presented, with the prospect that these noise
sources will be mitigated in future Chinese deep space missions. A feasibility
study is also undertaken to understand the scientific capability of the Chang'e
4 mission, due to be launched in 2018, in relation to the stochastic
gravitational wave background around 0.01 Hz. The study indicates that the
upper bound on the energy density may be further improved by another order of
magnitude from the Chang'e 3 mission, which will fill the gap in the frequency
band from 0.02 Hz to 0.1 Hz in the foreseeable future.
-
We propose an efficient stepwise adiabatic merging (SAM) method to generate
many-body singlet states in antiferromagnetic spin-1 bosons in concatenated
optical superlattices with isolated double-well arrays, by adiabatically
ramping up the double-well bias. With an appropriate choice of bias sweeping
rate and magnetic field, the SAM protocol predicts a fidelity as high as 90%
for a sixteen-body singlet state and even higher fidelities for smaller
even-body singlet states. During their evolution, the spin-1 bosons exhibit
interesting squeezing dynamics, manifested by an odd-even oscillation of the
experimentally observable squeezing parameter. The generated many-body singlet
states may find practical applications in precision measurement of magnetic
field gradient and in quantum information processing.