
For distributed computing environment, we consider the empirical risk
minimization problem and propose a distributed and communicationefficient
Newtontype optimization method. At every iteration, each worker locally finds
an Approximate NewTon (ANT) direction, which is sent to the main driver. The
main driver, then, averages all the ANT directions received from workers to
form a {\it Globally Improved ANT} (GIANT) direction. GIANT is highly
communication efficient and naturally exploits the tradeoffs between local
computations and global communications in that more local computations result
in fewer overall rounds of communications. Theoretically, we show that GIANT
enjoys an improved convergence rate as compared with firstorder methods and
existing distributed Newtontype methods. Further, and in sharp contrast with
many existing distributed Newtontype methods, as well as popular firstorder
methods, a highly advantageous practical feature of GIANT is that it only
involves one tuning parameter. We conduct largescale experiments on a computer
cluster and, empirically, demonstrate the superior performance of GIANT.

Let $F$ be a nonarchimedean local field of odd residue characteristic $p$.
Let $G$ be the unramified unitary group $U(2, 1)(E/F)$ in three variables, and
$K$ be a maximal compact open subgroup of $G$. For an irreducible smooth
representation $\sigma$ of $K$ over $\overline{\mathbf{F}}_p$, we prove that
the compactly induced representation $\text{ind}^G _K \sigma$ is free of
infinite rank over the spherical Hecke algebra $\mathcal{H}(K, \sigma)$.

Canonical correlation analysis (CCA) is a stateoftheart method for
frequency recognition in steadystate visual evoked potential (SSVEP)based
braincomputer interface (BCI) systems. Various extended methods have been
developed, and among such methods, a combination method of CCA and
individualtemplatebased CCA (ITCCA) has achieved excellent performance.
However, CCA requires the canonical vectors to be orthogonal, which may not be
a reasonable assumption for EEG analysis. In the current study, we propose
using the correlated component analysis (CORRCA) rather than CCA to implement
frequency recognition. CORRCA can relax the constraint of canonical vectors in
CCA, and generate the same projection vector for two multichannel EEG signals.
Furthermore, we propose a twostage method based on the basic CORRCA method
(termed TSCORRCA). Evaluated on a benchmark dataset of thirtyfive subjects,
the experimental results demonstrate that CORRCA significantly outperformed
CCA, and TSCORRCA obtained the best performance among the compared methods.
This study demonstrates that CORRCAbased methods have great potential for
implementing highperformance SSVEPbased BCI systems.

This paper describes our system that has been submitted to SemEval2018 Task
1: Affect in Tweets (AIT) to solve five subtasks. We focus on modeling both
sentence and word level representations of emotion inside texts through large
distantly labeled corpora with emojis and hashtags. We transfer the emotional
knowledge by exploiting neural network models as feature extractors and use
these representations for traditional machine learning models such as support
vector regression (SVR) and logistic regression to solve the competition tasks.
Our system is placed among the Top3 for all subtasks we participated.

We present topographic and spectroscopic scanning tunneling microscopy
measurements taken on a 21 nm thick TiN film at a temperature of 4.2 K  above
the superconducting transition temperature (T_c = 3.8 K) of the sample. The
film was polycrystalline with crystallite diameters of d~19 nm, consistent with
other films prepared under similar conditions. The spectroscopic maps show on
average a shallow Vshape around V_b = 0 V consistent with a sample near the
Mott insulation transition. In selected regions on several samples we
additionally observed signs of Coulomb blockade. The corresponding peak
structures are typically asymmetric with respect to bias voltage indicating
coupling to two very different tunneling barriers. Furthermore, the peak
structures appear with constant peakpeak spacing which indicates quantum dot
states within the Coulomb blockade island. In this paper we discuss one such
Coulomb blockade area and its implications in detail.

The task of Finegrained Entity Type Classification (FETC) consists of
assigning types from a hierarchy to entity mentions in text. Existing methods
rely on distant supervision and are thus susceptible to noisy labels that can
be outofcontext or overlyspecific for the training sentence. Previous
methods that attempt to address these issues do so with heuristics or with the
help of handcrafted features. Instead, we propose an endtoend solution with
a neural network model that uses a variant of cross entropy loss function to
handle outofcontext labels, and hierarchical loss normalization to cope with
overlyspecific ones. Also, previous work solve FETC a multilabel
classification followed by adhoc postprocessing. In contrast, our solution is
more elegant: we use public word embeddings to train a singlelabel that
jointly learns representations for entity mentions and their context. We show
experimentally that our approach is robust against noise and consistently
outperforms the stateoftheart on established benchmarks for the task.

We propose a deep hashing framework for sketch retrieval that, for the first
time, works on a multimillion scale human sketch dataset. Leveraging on this
large dataset, we explore a few sketchspecific traits that were otherwise
understudied in prior literature. Instead of following the conventional sketch
recognition task, we introduce the novel problem of sketch hashing retrieval
which is not only more challenging, but also offers a better testbed for
largescale sketch analysis, since: (i) more finegrained sketch feature
learning is required to accommodate the large variations in style and
abstraction, and (ii) a compact binary code needs to be learned at the same
time to enable efficient retrieval. Key to our network design is the embedding
of unique characteristics of human sketch, where (i) a twobranch CNNRNN
architecture is adapted to explore the temporal ordering of strokes, and (ii) a
novel hashing loss is specifically designed to accommodate both the temporal
and abstract traits of sketches. By working with a 3.8M sketch dataset, we show
that stateoftheart hashing models specifically engineered for static images
fail to perform well on temporal sketch data. Our network on the other hand not
only offers the best retrieval performance on various code sizes, but also
yields the best generalization performance under a zeroshot setting and when
repurposed for sketch recognition. Such superior performances effectively
demonstrate the benefit of our sketchspecific design.

Complex oxide interfaces are a promising platform for studying a wide array
of correlated electron phenomena in lowdimensions, including magnetism and
superconductivity. The microscopic origin of these phenomena in complex oxide
interfaces remains an open question. Here we investigate for the first time the
magnetic properties of semiinsulating NdTiO$_3$/SrTiO$_3$ (NTO/STO) interfaces
and present the first milliKelvin study of NTO/STO. The magnetoresistance (MR)
reveals signatures of local ferromagnetic order and of spindependent
thermallyactivated transport, which are described quantitatively by a simple
phenomenological model. We discuss possible origins of the interfacial
ferromagnetism. In addition, the MR also shows transient hysteretic features on
a timescale of ~10100 seconds. We demonstrate that these are consistent with
an extrinsic magnetothermal origin, which may have been misinterpreted in
previous reports of magnetism in STObased oxide interfaces. The existence of
these two MR regimes (steadystate and transient) highlights the importance of
timedependent measurements for distinguishing signatures of ferromagnetism
from other effects that can produce hysteresis at low temperatures.

Let $G$ be the unramified unitary group $U(2, 1)(E/F)$ over a nonarchimedean
local field $F$ of odd residue characteristic $p$. In this paper, for any
admissible supersingular representation of $G$ that contains the Steinberg
weight, we prove its pro$p$Iwahori invariants, as a right module over the
pro$p$IwahoriHecke algebra of $G$, is \emph{not} simple.

Our mysterious brain is believed to operate near a nonequilibrium point and
generate critical selforganized avalanches in neuronal activity. Recent
experimental evidence has revealed significant heterogeneity in both synaptic
input and output connectivity, but whether the structural heterogeneity
participates in the regulation of neuronal avalanches remains poorly
understood. By computational modelling, we predict that different types of
structural heterogeneity contribute distinct effects on avalanche
neurodynamics. In particular, neuronal avalanches can be triggered at an
intermediate level of input heterogeneity, but heterogeneous output
connectivity cannot evoke avalanche dynamics. In the criticality region, the
coemergence of multiscale cortical activities is observed, and both the
avalanche dynamics and neuronal oscillations are modulated by the input
heterogeneity. Remarkably, we show similar results can be reproduced in
networks with various types of in and outdegree distributions. Overall, these
findings not only provide details on the underlying circuitry mechanisms of
nonrandom synaptic connectivity in the regulation of neuronal avalanches, but
also inspire testable hypotheses for future experimental studies.

Interplays between quantum physics and gravity has long inspired exciting
studies, which also reveals subtle connections between quantum laws and the
general notion of curved spacetime. One important example is the uniqueness of
freefalling motions in both quantum and gravitational physics. In this work,
we study, from a different perspective, the free motions of quantum test wave
packets that distributed over weakly curved spacetime backgrounds. Except for
the de Broglie relations, no assumption of priori given Hamiltonians or least
actions satisfied by the quantum system is made. We find that the mean motions
of quantum test wave packets can be deduced naturally from the de Broglie
relations with a generalized treatment of gravitational time dilations in the
quantum waves. Such mean motions of quantum test systems are independent of
their masses and compositions, and restores exactly the freefalling or
geodesic motions of classical test masses in curved spacetime. This suggests a
novel perspective that weak equivalence principle, which states the
universality of freefall and serves as the foundations of gravitational
theories, may be deeply rooted in quantum physics and be a phenomena emergent
from the quantum world.

Let $F$ be a nonarchimedean local field of odd residue characteristic $p$.
Let $G$ be the unramified unitary group $U(2, 1)(E/F)$, and $K$ be a maximal
compact open subgroup of $G$. For an $\overline{\mathbf{F}}_p$smooth
representation $\pi$ of $G$ containing a weight $\sigma$ of $K$, we follow the
work of Hu (\cite{Hu12}) to attach $\pi$ a certain $I_K$subrepresentation,
where $I_K$ is the Iwahori subgroup in $K$. In terms of such an
$I_K$subrepresentation, we prove a sufficient condition for $\pi$ to be
nonfinitely presented. We determine such an $I_K$subrepresentation
explicitly, when $\pi$ is either a spherical universal Hecke module or an
irreducible principal series.

Let $E/F$ be a unramified quadratic extension of nonarchimedean local fields
of odd characteristic $p$, and $G$ be the unramified unitary group $U(2,
1)(E/F)$. For an irreducible smooth representation $\pi$ of $G$ over
$\overline{\mathbf{F}}_p$, with an underlying irreducible smooth representation
$\sigma$ of a maximal compact open subgroup $K$, we prove that $\pi$ admits
eigenvectors for an appropriate Hecke operator $T_\sigma$, and we classify
those $\pi$ with nonzero eigenvalues for $T_\sigma$ by a tree argument; as a
corollary, we show $\pi$ is supersingular if and only if it is supercuspidal.

For solving largescale nonconvex problems, we propose inexact variants of
trust region and adaptive cubic regularization methods, which, to increase
efficiency, incorporate various approximations. In particular, in addition to
approximate subproblem solves, both the Hessian and the gradient are suitably
approximated. Using rather mild conditions on such approximations, we show that
our proposed inexact methods achieve similar optimal worstcase iteration
complexities as the exact counterparts. Our proposed algorithms, and their
respective theoretical analysis, do not require knowledge of any unknowable
problemrelated quantities, and hence are easily implementable in practice. In
the context of finitesum problems, we then explore randomized subsampling
methods as ways to construct the gradient and Hessian approximations and
examine the empirical performance of our algorithms on some real datasets.

While firstorder optimization methods such as stochastic gradient descent
(SGD) are popular in machine learning (ML), they come with wellknown
deficiencies, including relativelyslow convergence, sensitivity to the
settings of hyperparameters such as learning rate, stagnation at high training
errors, and difficulty in escaping flat regions and saddle points. These issues
are particularly acute in highly nonconvex settings such as those arising in
neural networks. Motivated by this, there has been recent interest in
secondorder methods that aim to alleviate these shortcomings by capturing
curvature information. In this paper, we report detailed empirical evaluations
of a class of Newtontype methods, namely subsampled variants of trust region
(TR) and adaptive regularization with cubics (ARC) algorithms, for nonconvex
ML problems. In doing so, we demonstrate that these methods not only can be
computationally competitive with handtuned SGD with momentum, obtaining
comparable or better generalization performance, but also they are highly
robust to hyperparameter settings. Further, in contrast to SGD with momentum,
we show that the manner in which these Newtontype methods employ curvature
information allows them to seamlessly escape flat regions and saddle points.

We consider variants of trustregion and cubic regularization methods for
nonconvex optimization, in which the Hessian matrix is approximated. Under
mild conditions on the inexact Hessian, and using approximate solution of the
corresponding subproblems, we provide iteration complexity to achieve $
\epsilon $approximate secondorder optimality which have shown to be tight.
Our Hessian approximation conditions constitute a major relaxation over the
existing ones in the literature. Consequently, we are able to show that such
mild conditions allow for the construction of the approximate Hessian through
various random sampling methods. In this light, we consider the canonical
problem of finitesum minimization, provide appropriate uniform and nonuniform
subsampling strategies to construct such Hessian approximations, and obtain
optimal iteration complexity for the corresponding subsampled trustregion and
cubic regularization methods.

We report an evaluation of the effectiveness of the existing knowledge base
embedding models for relation prediction and for relation extraction on a wide
range of benchmarks. We also describe a new benchmark, which is much larger and
complex than previous ones, which we introduce to help validate the
effectiveness of both tasks. The results demonstrate that knowledge base
embedding models are generally effective for relation prediction but unable to
give improvements for the stateofart neural relation extraction model with
the existing strategies, while pointing limitations of existing methods.

We propose a Bell measurement free scheme to implement a quantum repeater in
GaAs/AlGa double qunatum dot systems.we prove the four pairs of double quantum
dots compose an entanglement unit, given the the initial state is singlet
states. Our shceme differs from the famous DuanLukinCiraczoller(DLCZ)
protocol in that Bell measurements are unneccessary for the entanglement
swapping,which provides great advantages and conveniences in experimental
implementaion. Our scheme significantly improve the success probability of
quantum repeaters based on solid state quantum devices.

A challenge in training discriminative models like neural networks is
obtaining enough labeled training data. Recent approaches use generative models
to combine weak supervision sources, like userdefined heuristics or knowledge
bases, to label training data. Prior work has explored learning accuracies for
these sources even without ground truth labels, but they assume that a single
accuracy parameter is sufficient to model the behavior of these sources over
the entire training set. In particular, they fail to model latent subsets in
the training data in which the supervision sources perform differently than on
average. We present Socratic learning, a paradigm that uses feedback from a
corresponding discriminative model to automatically identify these subsets and
augments the structure of the generative model accordingly. Experimentally, we
show that without any ground truth labels, the augmented generative model
reduces error by up to 56.06% for a relation extraction task compared to a
stateoftheart weak supervision technique that utilizes generative models.

Biological neurons receive multiple noisy oscillatory signals, and their
dynamical response to the superposition of these signals is of fundamental
importance for information processing in the brain. Here we study the response
of neural systems to the weak envelope modulation signal, which is superimposed
by two periodic signals with different frequencies. We show that stochastic
resonance occurs at the beat frequency in neural systems at the singleneuron
as well as the population level. The performance of this
frequencydifferencedependent stochastic resonance is influenced by both the
beat frequency and the two forcing frequencies. Compared to a single neuron, a
population of neurons is more efficient in detecting the information carried by
the weak envelope modulation signal at the beat frequency. Furthermore, an
appropriate finetuning of the excitationinhibition balance can further
optimize the response of a neural ensemble to the superimposed signal. Our
results thus introduce and provide insights into the generation and modulation
mechanism of the frequencydifferencedependent stochastic resonance in neural
systems.

This paper investigates the optimal power allocation scheme for sum
throughput maximization of nonorthogonal multiple access (NOMA) system with
$\alpha$fairness. In contrast to the existing fairness NOMA models,
$\alpha$fairness can only utilize a single scalar to achieve different user
fairness levels. Two different channel state information at the transmitter
(CSIT) assumptions are considered, namely, statistical and perfect CSIT. For
statistical CSIT, fixed target data rates are predefined, and the power
allocation problem is solved for sum throughput maximization with
$\alpha$fairness, through characterizing several properties of the optimal
power allocation solution. For perfect CSIT, the optimal power allocation is
determined to maximize the instantaneous sum rate with $\alpha$fairness, where
user rates are adapted according to the instantaneous channel state information
(CSI). In particular, a simple alternate optimization (AO) algorithm is
proposed, which is demonstrated to yield the optimal solution. Numerical
results reveal that, at the same fairness level, NOMA significantly outperforms
the conventional orthogonal multiple access (MA) for both the scenarios with
statistical and perfect CSIT.

Principal component analysis (PCA) is one of the most powerful tools in
machine learning. The simplest method for PCA, the power iteration, requires
$\mathcal O(1/\Delta)$ fulldata passes to recover the principal component of a
matrix with eigengap $\Delta$. Lanczos, a significantly more complex method,
achieves an accelerated rate of $\mathcal O(1/\sqrt{\Delta})$ passes. Modern
applications, however, motivate methods that only ingest a subset of available
data, known as the stochastic setting. In the online stochastic setting, simple
algorithms like Oja's iteration achieve the optimal sample complexity $\mathcal
O(\sigma^2/\Delta^2)$. Unfortunately, they are fully sequential, and also
require $\mathcal O(\sigma^2/\Delta^2)$ iterations, far from the $\mathcal
O(1/\sqrt{\Delta})$ rate of Lanczos. We propose a simple variant of the power
iteration with an added momentum term, that achieves both the optimal sample
and iteration complexity. In the fullpass setting, standard analysis shows
that momentum achieves the accelerated rate, $\mathcal O(1/\sqrt{\Delta})$. We
demonstrate empirically that naively applying momentum to a stochastic method,
does not result in acceleration. We perform a novel, tight variance analysis
that reveals the "breakingpoint variance" beyond which this acceleration does
not occur. By combining this insight with modern variance reduction techniques,
we construct stochastic PCA algorithms, for the online and offline setting,
that achieve an accelerated iteration complexity $\mathcal O(1/\sqrt{\Delta})$.
Due to the embarassingly parallel nature of our methods, this acceleration
translates directly to wallclock time if deployed in a parallel environment.
Our approach is very general, and applies to many nonconvex optimization
problems that can now be accelerated using the same technique.

We analyze a measurement scheme that allows determination of the Berry
curvature and the topological Chern number of a Hamiltonian with parameters
exploring a twodimensional closed manifold. Our method uses continuous
monitoring of the gradient of the Hamiltonian with respect to one parameter
during a quasiadiabatic quench of the other. Measurement backaction leads to
disturbance of the system dynamics, but we show that this can be compensated by
a feedback Hamiltonian. As an example, we analyze the implementation with a
superconducting qubit subject to time varying, near resonant microwave fields;
equivalent to a spin 1/2 particle in a magnetic field.

The Doppler tracking data of the Chang'e 3 lunar mission is used to constrain
the stochastic background of gravitational wave in cosmology within the 1 mHz
to 0.05 Hz frequency band. Our result improves on the upper bound on the energy
density of the stochastic background of gravitational wave in the 0.02 Hz to
0.05 Hz band obtained by the Apollo missions, with the improvement reaching
almost one order of magnitude at around 0.05 Hz. Detailed noise analysis of the
Doppler tracking data is also presented, with the prospect that these noise
sources will be mitigated in future Chinese deep space missions. A feasibility
study is also undertaken to understand the scientific capability of the Chang'e
4 mission, due to be launched in 2018, in relation to the stochastic
gravitational wave background around 0.01 Hz. The study indicates that the
upper bound on the energy density may be further improved by another order of
magnitude from the Chang'e 3 mission, which will fill the gap in the frequency
band from 0.02 Hz to 0.1 Hz in the foreseeable future.

We propose an efficient stepwise adiabatic merging (SAM) method to generate
manybody singlet states in antiferromagnetic spin1 bosons in concatenated
optical superlattices with isolated doublewell arrays, by adiabatically
ramping up the doublewell bias. With an appropriate choice of bias sweeping
rate and magnetic field, the SAM protocol predicts a fidelity as high as 90%
for a sixteenbody singlet state and even higher fidelities for smaller
evenbody singlet states. During their evolution, the spin1 bosons exhibit
interesting squeezing dynamics, manifested by an oddeven oscillation of the
experimentally observable squeezing parameter. The generated manybody singlet
states may find practical applications in precision measurement of magnetic
field gradient and in quantum information processing.