
The stochastic gradient descent (SGD) algorithm has been widely used in
statistical estimation for largescale data due to its computational and memory
efficiency. While most existing works focus on the convergence of the objective
function or the error of the obtained solution, we investigate the problem of
statistical inference of true model parameters based on SGD when the population
loss function is strongly convex and satisfies certain smoothness conditions.
Our main contributions are twofold. First, in the fixed dimension setup, we
propose two consistent estimators of the asymptotic covariance of the average
iterate from SGD: (1) a plugin estimator, and (2) a batchmeans estimator,
which is computationally more efficient and only uses the iterates from SGD.
Both proposed estimators allow us to construct asymptotically exact confidence
intervals and hypothesis tests. Second, for highdimensional linear regression,
using a variant of the SGD algorithm, we construct a debiased estimator of each
regression coefficient that is asymptotically normal. This gives a onepass
algorithm for computing both the sparse regression coefficients and confidence
intervals, which is computationally attractive and applicable to online data.

Two of the leading approaches for modelfree reinforcement learning are
policy gradient methods and $Q$learning methods. $Q$learning methods can be
effective and sampleefficient when they work, however, it is not
wellunderstood why they work, since empirically, the $Q$values they estimate
are very inaccurate. A partial explanation may be that $Q$learning methods are
secretly implementing policy gradient updates: we show that there is a precise
equivalence between $Q$learning and policy gradient methods in the setting of
entropyregularized reinforcement learning, that "soft" (entropyregularized)
$Q$learning is exactly equivalent to a policy gradient method. We also point
out a connection between $Q$learning methods and natural policy gradient
methods. Experimentally, we explore the entropyregularized versions of
$Q$learning and policy gradients, and we find them to perform as well as (or
slightly better than) the standard variants on the Atari benchmark. We also
show that the equivalence holds in practical settings by constructing a
$Q$learning method that closely matches the learning dynamics of A3C without
using a target network or $\epsilon$greedy exploration schedule.

In this paper, we consider the nonparametric regression problem with
multivariate predictors. We provide a characterization of the degrees of
freedom and divergence for estimators of the unknown regression function, which
are obtained as outputs of linearly constrained quadratic optimization
procedures, namely, minimizers of the least squares criterion with linear
constraints and/or quadratic penalties. As special cases of our results, we
derive explicit expressions for the degrees of freedom in many nonparametric
regression problems, e.g., bounded isotonic regression, multivariate
(penalized) convex regression, and additive total variation regularization. Our
theory also yields, as special cases, known results on the degrees of freedom
of many wellstudied estimators in the statistics literature, such as ridge
regression, Lasso and generalized Lasso. Our results can be readily used to
choose the tuning parameter(s) involved in the estimation procedure by
minimizing the Stein's unbiased risk estimate. As a byproduct of our analysis
we derive an interesting connection between bounded isotonic regression and
isotonic regression on a general partially ordered set, which is of independent
interest.

In this short note we consider a dynamic assortment planning problem under
the capacitated multinomial logit (MNL) bandit model. We prove a tight lower
bound on the accumulated regret that matches existing regret upper bounds for
all parameters (time horizon $T$, number of items $N$ and maximum assortment
capacity $K$) up to logarithmic factors. Our results close an $O(\sqrt{K})$ gap
between upper and lower regret bounds from existing works.

We study the nonlinear dynamics of trappedion models far away from the
LambDicke regime. This nonlinearity induces a sideband cooling blockade,
stopping the propagation of quantum information along the Hilbert space of the
JaynesCummings and quantum Rabi models. We compare the linear and nonlinear
cases of these models in the ultrastrong and deep strong coupling regimes.
Moreover, we propose a scheme that simulates the nonlinear quantum Rabi model
in all coupling regimes. This can be done via offresonant nonlinear red and
blue sideband interactions, yielding applications as a dynamical quantum
filter.

Tensor network (TN), a young mathematical tool of high vitality and great
potential, has been undergoing extremely rapid developments in the last two
decades, gaining tremendous success in condensed matter physics, atomic
physics, quantum information science, statistical physics, and so on. In this
lecture notes, we focus on the contraction algorithms of TN as well as some of
the applications to the simulations of quantum manybody systems. Starting from
basic concepts and definitions, we first explain the relations between TN and
physical problems, including the TN representations of classical partition
functions, quantum manybody states (by matrix product state, tree TN, and
projected entangled pair state), time evolution simulations, etc. These
problems, which are challenging to solve, can be transformed to TN contraction
problems. We present then several paradigm algorithms based on the ideas of the
numerical renormalization group and/or boundary states, including density
matrix renormalization group, timeevolving block decimation,
coarsegraining/corner tensor renormalization group, and several distinguished
variational algorithms. Finally, we revisit the TN approaches from the
perspective of multilinear algebra (also known as tensor algebra or tensor
decompositions) and quantum simulation. Despite the apparent differences in the
ideas and strategies of different TN algorithms, we aim at revealing the
underlying relations and resemblances in order to present a systematic picture
to understand the TN contraction approaches.

We consider a nonstationary sequential stochastic optimization problem, in
which the underlying cost functions change over time under a variation budget
constraint. We propose an $L_{p,q}$variation functional to quantify the
change, which yields less variation for dynamic function sequences whose
changes are constrained to short time periods or small subsets of input domain.
Under the $L_{p,q}$variation constraint, we derive both upper and matching
lower regret bounds for smooth and strongly convex function sequences, which
generalize previous results in Besbes et al. (2015). Furthermore, we provide an
upper bound for general convex function sequences with noisy gradient feedback,
which matches the optimal rate as $p\to\infty$. Our results reveal some
surprising phenomena under this general variation functional, such as the curse
of dimensionality of the function domain. The key technical novelties in our
analysis include affinity lemmas that characterize the distance of the
minimizers of two convex functions with bounded Lp difference, and a cubic
spline based construction that attains matching lower bounds.

Rapid and efficient preparation, manipulation and transfer of quantum states
through an array of quantum dots (QDs) is a demanding requisite task for
quantum information processing and quantum computation in solidstate physics.
Conventional adiabatic protocols, as coherent transfer by adiabatic passage
(CTAP) and its variations, provide slow transfer prone to decoherence, which
could lower the fidelity to some extent. To achieve the robustness against
decoherence, we propose a protocol of speeding up the adiabatic charge transfer
in multiQD systems, sharing the concept of "Shortcuts to Adiabaticity" (STA).
We first apply the STA techniques, including the counterdiabatic driving and
inverse engineering, to speed up the direct (long range) transfer between edge
dots in triple QDs. Then, we extend our analysis to a multidot system. We show
how by implementing the modified pulses, fast adiabaticlike charge transport
between the outer dots can be eventually achieved without populating
intermediate dots. We discuss as well the dependence of the transfer fidelity
on the operation time in the presence of dephasing. The proposed protocols for
accelerating adiabatic charge transfer directly between the outer dots in a QD
array offers a robust mechanism for quantum information processing, by
minimizing decoherence and relaxation processes.

Mobile robot navigation in complex and dynamic environments is a challenging
but important problem. Reinforcement learning approaches fail to solve these
tasks efficiently due to reward sparsities, temporal complexities and
highdimensionality of sensorimotor spaces which are inherent in such problems.
We present a novel approach to train action policies to acquire navigation
skills for wheellegged robots using deep reinforcement learning. The policy
maps heightmap image observations to motor commands to navigate to a target
position while avoiding obstacles. We propose to acquire the multifaceted
navigation skill by learning and exploiting a number of manageable navigation
behaviors. We also introduce a domain randomization technique to improve the
versatility of the training samples. We demonstrate experimentally a
significant improvement in terms of dataefficiency, success rate, robustness
against irrelevant sensory data, and also the quality of the maneuver skills.

With the ever growing diversity of devices and applications that will be
connected to 5G networks, flexible and agile service orchestration with
acknowledged QoE that satisfies enduser's functional and QoS requirements is
necessary. SDN (SoftwareDefined Networking) and NFV (Network Function
Virtualization) are considered key enabling technologies for 5G core networks.
In this regard, this paper proposes a reinforcement learning based
QoS/QoEaware Service Function Chaining (SFC) in SDN/NFVenabled 5G slices.
First, it implements a lightweight QoS information collector based on LLDP,
which works in a piggyback fashion on the southbound interface of the SDN
controller, to enable QoSawareness. Then, a DQN (Deep Q Network) based agent
framework is designed to support SFC in the context of NFV. The agent takes
into account the QoE and QoS as key aspects to formulate the reward so that it
is expected to maximize QoE while respecting QoS constraints. The experiment
results show that this framework exhibits good performance in QoE provisioning
and QoS requirements maintenance for SFC in dynamic network environments.

Slot filling is a critical task in natural language understanding (NLU) for
dialog systems. Stateoftheart solutions regard it as a sequence label ing
task and adopt BiLSTMCRF models. While BiLSTMCRF models works relatively well
on standard datasets it faces challenges in Chinese Ecommerce slot filling due
to more informative slot labels and richer expressions. In this paper, we
propose a deep multitask learning model with cascade and residual connections.
Experimental results show that our framework not only achieves competitive
performance with stateofthearts on a standard dataset, but also
significantly outperforms strong baselines by a substantial gain of 14.6% on a
Chinese Ecommerce dataset.

In recent years, twodimensional confined catalysis, i.e. the enhanced
catalytic reactions in confined spaces between metal surface and
twodimensional overlayer, makes a hit and opens up a new way to enhance the
performance of catalysts. In this work, graphdiyne overlayer was proposed as a
more excellent material than graphene or hexagonal boron nitride for
twodimensional confined catalysis. Density functional theory calculations
revealed the superiority of graphdiyne overlayer originated from the steric
hindrance effect which increases the catalytic ability and lowers the reaction
barriers. Moreover, with the big triangle holes as natural gas tunnels,
graphdiyne possesses higher efficiency for the transit of gaseous reactants and
products than graphene or hexagonal boron nitride. The results in this work
would benefit future development twodimensional confined catalysis.

Recently, twodimensional materials and nanoparticles with robust
ferromagnetism are even of great interest to explore basic physics in nanoscale
spintronics. More importantly, roomtemperature magnetic semiconducting
materials with high Curie temperature is essential for developing
nextgeneration spintronic and quantum computing devices. Here, we develop a
theoretical model on the basis of density functional theory calculations and
the RudermanKittelKasuyaYoshida theory to predict the thermal stability of
twodimensional magnetic materials. Compared with other rareearth (dysprosium
(Dy) and erbium (Er)) and 3d (copper (Cu)) impurities, holmiumdoped (Hodoped)
singlelayer 1HMoS2 is proposed as promising semiconductor with robust
magnetism. The calculations at the level of hybrid HSE06 functional predict a
Curie temperature much higher than room temperature. Hodoped MoS2 sheet
possesses fully spinpolarized valence and conduction bands, which is a
prerequisite for flexible spintronic applications.

In this paper, we study the problem of imagetext matching. Inferring the
latent semantic alignment between objects or other salient stuffs (e.g. snow,
sky, lawn) and the corresponding words in sentences allows to capture
finegrained interplay between vision and language, and makes imagetext
matching more interpretable. Prior works either simply aggregate the similarity
of all possible pairs of regions and words without attending differentially to
more and less important words or regions, or use a multistep attentional
process to capture limited number of semantic alignments which is less
interpretable. In this paper, we present Stacked Cross Attention to discover
the full latent alignments using both image regions and words in sentence as
context and infer the imagetext similarity. Our approach achieves the
stateoftheart results on the MSCOCO and Flickr30K datasets. On Flickr30K,
our approach outperforms the current best methods by 22.1% in text retrieval
from image query, and 18.2% in image retrieval with text query (based on
Recall@1). On MSCOCO, our approach improves sentence retrieval by 17.8% and
image retrieval by 16.6% (based on Recall@1 using the 5K test set).

In realworld Bayesian inference applications, prior assumptions regarding
the parameters of interest may be unrepresentative of their actual values for a
given dataset. In particular, if the likelihood is concentrated far out in the
wings of the assumed prior distribution, this can lead to extremely inefficient
exploration of the resulting posterior by nested sampling algorithms, with
unnecessarily high associated computational costs. Simple solutions such as
broadening the prior range in such cases might not be appropriate or possible
in realworld applications, for example when one wishes to assume a single
standardised prior across the analysis of a large number of datasets for which
the true values of the parameters of interest may vary. This work therefore
introduces a posterior repartitioning (PR) method for nested sampling
algorithms, which addresses the problem by redefining the likelihood and prior
while keeping their product fixed, so that the posterior inferences remain
unchanged but the efficiency of the nested sampling process is significantly
increased. Numerical results show that the PR method provides a simple yet
powerful refinement for nested sampling algorithms to address the issue of
unrepresentative priors.

With the increased complexity of power systems due to the integration of
smart grid technologies and renewable energy resources, more frequent changes
have been introduced to system status, and the traditional serial mode of state
estimation algorithm cannot well meet the restrict timeconstrained requirement
for the future dynamic power grid, even with advanced computer hardware. To
guarantee the grid reliability and minimize the impacts caused by system status
fluctuations, a fast, even SCADArate, state estimator is urgently needed. In
this paper, a graph based power system modeling is firstly explored and a graph
computing based state estimation is proposed to speed up its performance. The
power system is represented by a graph, which is a collection of vertices and
edges, and the measurements are attributes of vertices and edges. Each vertex
can independently implement local computation, like formulations of the
nodebased H matrix, gain matrix and righthandside (RHS) vector, only with the
information on its connected edges and neighboring vertices. Then, by taking
advantages of graph database, these nodebased data are conveniently collected
and stored in the compressed sparse row (CSR) format avoiding the complexity
and heaviness introduced by the sparse matrices. With communications and
synchronization, centralized computation of solving the weighted least square
(WLS) state estimation is completed with hierarchical parallel computing. The
proposed strategy is implemented on a graph database platform. The testing
results of IEEE 14bus, IEEE 118bus systems and a provincial system in China
verify the accuracy and highperformance of the proposed methodology.

We present a new analysis of exchange and dispersion effects for calculating
halogenbonding interactions in a wide variety of complex dimers (69 total)
within the XB18 and XB51 benchmark sets. Contrary to previous work on these
systems, we find that dispersion plays a more significant role than exact
exchange in accurately calculating halogenbonding interaction energies, which
are further confirmed by extensive SAPT analyses. In particular, we find that
even if the amount of exact exchange is nonempirically tuned to satisfy known
DFT constraints, we still observe an overall improvement in predicting
dissociation energies when dispersion corrections are applied, in stark
contrast to previous studies (J. Chem. Theory Comput. 2013, 9, 19181931). In
addition to these new analyses, we correct several (14) inconsistencies in the
XB51 set, which is widely used in the scientific literature for developing and
benchmarking various DFT methods. Together, these new analyses and revised
benchmarks emphasize the importance of dispersion and provide corrected
reference values that are essential for developing/parameterizing new DFT
functionals specifically for complex halogenbonding interactions.

Imitation learning is a powerful paradigm for robot skill acquisition.
However, obtaining demonstrations suitable for learning a policy that maps from
raw pixels to actions can be challenging. In this paper we describe how
consumergrade Virtual Reality headsets and hand tracking hardware can be used
to naturally teleoperate robots to perform complex tasks. We also describe how
imitation learning can learn deep neural network policies (mapping from pixels
to actions) that can acquire the demonstrated skills. Our experiments showcase
the effectiveness of our approach for learning visuomotor skills.

We consider the problem of exploration in meta reinforcement learning. Two
new meta reinforcement learning algorithms are suggested: EMAML and
E$\text{RL}^2$. Results are presented on a novel environment we call `Krazy
World' and a set of maze environments. We show EMAML and E$\text{RL}^2$
deliver better performance on tasks where exploration is important.

Deep neural networks excel in regimes with large amounts of data, but tend to
struggle when data is scarce or when they need to adapt quickly to changes in
the task. In response, recent work in metalearning proposes training a
metalearner on a distribution of similar tasks, in the hopes of generalization
to novel but related tasks by learning a highlevel strategy that captures the
essence of the problem it is asked to solve. However, many recent metalearning
approaches are extensively handdesigned, either using architectures
specialized to a particular application, or hardcoding algorithmic components
that constrain how the metalearner solves the task. We propose a class of
simple and generic metalearner architectures that use a novel combination of
temporal convolutions and soft attention; the former to aggregate information
from past experience and the latter to pinpoint specific pieces of information.
In the most extensive set of metalearning experiments to date, we evaluate the
resulting Simple Neural AttentIve Learner (or SNAIL) on several
heavilybenchmarked tasks. On all tasks, in both supervised and reinforcement
learning, SNAIL attains stateoftheart performance by significant margins.

We present new criteria for selecting HII regions from the Infrared
Astronomical Satellite (IRAS) Point Source catalogue (PSC), based on an HII
region catalogue derived manually from the allsky Widefield Infrared Survey
Explorer (WISE). The criteria are used to augment the number of HII region
candidates in the Milky Way. The criteria are defined by the linear decision
boundary of two samples: IRAS point sources associated with known HII regions,
which serve as the HII region sample, and IRAS point sources at high Galactic
latitudes, which serve as the nonHII region sample. A machine learning
classifier, specifically a support vector machine (SVM), is used to determine
the decision boundary. We investigate all combinations of four IRAS bands and
suggest that the optimal criterion is log(F$_{\rm 60}$/F$_{\rm
12}$)$\ge$(0.19$\times$log(F$_{\rm 100}$/F$_{\rm 25}$)+ 1.52), with detections
at 60 and 100 micron. This selects 3041 HII region candidates from the IRAS
PSC. We find that IRAS HII region candidates show evidence of evolution on the
twocolour diagram. Merging the WISE HII catalogue with IRAS HII region
candidates, we estimate a lower limit of approximately 10200 for the number of
HII regions in the Milky Way.

In this paper, we introduce a webscale general visual search system deployed
in Microsoft Bing. The system accommodates tens of billions of images in the
index, with thousands of features for each image, and can respond in less than
200 ms. In order to overcome the challenges in relevance, latency, and
scalability in such large scale of data, we employ a cascaded learningtorank
framework based on various latest deep learning visual features, and deploy in
a distributed heterogeneous computing platform. Quantitative and qualitative
experiments show that our system is able to support various applications on
Bing website and apps.

In this paper, we consider an unconstrained optimization model where the
objective is a sum of a large number of possibly nonconvex functions, though
overall the objective is assumed to be smooth and convex. Our bid to solving
such model uses the framework of cubic regularization of Newton's method.As
well known, the crux in cubic regularization is its utilization of the Hessian
information, which may be computationally expensive for largescale problems.
To tackle this, we resort to approximating the Hessian matrix via subsampling.
In particular, we propose to compute an approximated Hessian matrix by either
uniformly or nonuniformly subsampling the components of the objective. Based
upon subsampling, we develop both standard and accelerated adaptive cubic
regularization approaches and provide theoretical guarantees on global
iteration complexity. We show that the standard and accelerated subsampled
cubic regularization methods achieve iteration complexity in the order of
$O(\epsilon^{1/2})$ and $O(\epsilon^{1/3})$ respectively, which match those
of the original standard and accelerated cubic regularization methods
\cite{Cartis2012Evaluation, Jiang2017Unified} using the full Hessian
information. The performances of the proposed methods on regularized logistic
regression problems show a clear effect of acceleration in terms of epochs on
several real data sets.

Automatic conflict detection has grown in relevance with the advent of
bodyworn technology, but existing metrics such as turntaking and overlap are
poor indicators of conflict in policepublic interactions. Moreover, standard
techniques to compute them fall short when applied to such diversified and
noisy contexts. We develop a pipeline catered to this task combining adaptive
noise removal, nonspeech filtering and new measures of conflict based on the
repetition and intensity of phrases in speech. We demonstrate the effectiveness
of our approach on bodyworn audio data collected by the Los Angeles Police
Department.

We study the problem of testing whether an unknown $n$variable Boolean
function is a $k$junta in the distributionfree property testing model, where
the distance between functions is measured with respect to an arbitrary and
unknown probability distribution over $\{0,1\}^n$. Our first main result is
that distributionfree $k$junta testing can be performed, with onesided
error, by an adaptive algorithm that uses $\tilde{O}(k^2)/\epsilon$ queries
(independent of $n$). Complementing this, our second main result is a lower
bound showing that any nonadaptive distributionfree $k$junta testing
algorithm must make $\Omega(2^{k/3})$ queries even to test to accuracy
$\epsilon=1/3$. These bounds establish that while the optimal query complexity
of nonadaptive $k$junta testing is $2^{\Theta(k)}$, for adaptive testing it
is $\text{poly}(k)$, and thus show that adaptivity provides an exponential
improvement in the distributionfree query complexity of testing juntas.