• The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing works focus on the convergence of the objective function or the error of the obtained solution, we investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain smoothness conditions. Our main contributions are two-fold. First, in the fixed dimension setup, we propose two consistent estimators of the asymptotic covariance of the average iterate from SGD: (1) a plug-in estimator, and (2) a batch-means estimator, which is computationally more efficient and only uses the iterates from SGD. Both proposed estimators allow us to construct asymptotically exact confidence intervals and hypothesis tests. Second, for high-dimensional linear regression, using a variant of the SGD algorithm, we construct a debiased estimator of each regression coefficient that is asymptotically normal. This gives a one-pass algorithm for computing both the sparse regression coefficients and confidence intervals, which is computationally attractive and applicable to online data.
  • Two of the leading approaches for model-free reinforcement learning are policy gradient methods and $Q$-learning methods. $Q$-learning methods can be effective and sample-efficient when they work, however, it is not well-understood why they work, since empirically, the $Q$-values they estimate are very inaccurate. A partial explanation may be that $Q$-learning methods are secretly implementing policy gradient updates: we show that there is a precise equivalence between $Q$-learning and policy gradient methods in the setting of entropy-regularized reinforcement learning, that "soft" (entropy-regularized) $Q$-learning is exactly equivalent to a policy gradient method. We also point out a connection between $Q$-learning methods and natural policy gradient methods. Experimentally, we explore the entropy-regularized versions of $Q$-learning and policy gradients, and we find them to perform as well as (or slightly better than) the standard variants on the Atari benchmark. We also show that the equivalence holds in practical settings by constructing a $Q$-learning method that closely matches the learning dynamics of A3C without using a target network or $\epsilon$-greedy exploration schedule.
  • In this paper, we consider the nonparametric regression problem with multivariate predictors. We provide a characterization of the degrees of freedom and divergence for estimators of the unknown regression function, which are obtained as outputs of linearly constrained quadratic optimization procedures, namely, minimizers of the least squares criterion with linear constraints and/or quadratic penalties. As special cases of our results, we derive explicit expressions for the degrees of freedom in many nonparametric regression problems, e.g., bounded isotonic regression, multivariate (penalized) convex regression, and additive total variation regularization. Our theory also yields, as special cases, known results on the degrees of freedom of many well-studied estimators in the statistics literature, such as ridge regression, Lasso and generalized Lasso. Our results can be readily used to choose the tuning parameter(s) involved in the estimation procedure by minimizing the Stein's unbiased risk estimate. As a by-product of our analysis we derive an interesting connection between bounded isotonic regression and isotonic regression on a general partially ordered set, which is of independent interest.
  • In this short note we consider a dynamic assortment planning problem under the capacitated multinomial logit (MNL) bandit model. We prove a tight lower bound on the accumulated regret that matches existing regret upper bounds for all parameters (time horizon $T$, number of items $N$ and maximum assortment capacity $K$) up to logarithmic factors. Our results close an $O(\sqrt{K})$ gap between upper and lower regret bounds from existing works.
  • We study the nonlinear dynamics of trapped-ion models far away from the Lamb-Dicke regime. This nonlinearity induces a sideband cooling blockade, stopping the propagation of quantum information along the Hilbert space of the Jaynes-Cummings and quantum Rabi models. We compare the linear and nonlinear cases of these models in the ultrastrong and deep strong coupling regimes. Moreover, we propose a scheme that simulates the nonlinear quantum Rabi model in all coupling regimes. This can be done via off-resonant nonlinear red and blue sideband interactions, yielding applications as a dynamical quantum filter.
  • Tensor network (TN), a young mathematical tool of high vitality and great potential, has been undergoing extremely rapid developments in the last two decades, gaining tremendous success in condensed matter physics, atomic physics, quantum information science, statistical physics, and so on. In this lecture notes, we focus on the contraction algorithms of TN as well as some of the applications to the simulations of quantum many-body systems. Starting from basic concepts and definitions, we first explain the relations between TN and physical problems, including the TN representations of classical partition functions, quantum many-body states (by matrix product state, tree TN, and projected entangled pair state), time evolution simulations, etc. These problems, which are challenging to solve, can be transformed to TN contraction problems. We present then several paradigm algorithms based on the ideas of the numerical renormalization group and/or boundary states, including density matrix renormalization group, time-evolving block decimation, coarse-graining/corner tensor renormalization group, and several distinguished variational algorithms. Finally, we revisit the TN approaches from the perspective of multi-linear algebra (also known as tensor algebra or tensor decompositions) and quantum simulation. Despite the apparent differences in the ideas and strategies of different TN algorithms, we aim at revealing the underlying relations and resemblances in order to present a systematic picture to understand the TN contraction approaches.
  • We consider a non-stationary sequential stochastic optimization problem, in which the underlying cost functions change over time under a variation budget constraint. We propose an $L_{p,q}$-variation functional to quantify the change, which yields less variation for dynamic function sequences whose changes are constrained to short time periods or small subsets of input domain. Under the $L_{p,q}$-variation constraint, we derive both upper and matching lower regret bounds for smooth and strongly convex function sequences, which generalize previous results in Besbes et al. (2015). Furthermore, we provide an upper bound for general convex function sequences with noisy gradient feedback, which matches the optimal rate as $p\to\infty$. Our results reveal some surprising phenomena under this general variation functional, such as the curse of dimensionality of the function domain. The key technical novelties in our analysis include affinity lemmas that characterize the distance of the minimizers of two convex functions with bounded Lp difference, and a cubic spline based construction that attains matching lower bounds.
  • Rapid and efficient preparation, manipulation and transfer of quantum states through an array of quantum dots (QDs) is a demanding requisite task for quantum information processing and quantum computation in solid-state physics. Conventional adiabatic protocols, as coherent transfer by adiabatic passage (CTAP) and its variations, provide slow transfer prone to decoherence, which could lower the fidelity to some extent. To achieve the robustness against decoherence, we propose a protocol of speeding up the adiabatic charge transfer in multi-QD systems, sharing the concept of "Shortcuts to Adiabaticity" (STA). We first apply the STA techniques, including the counterdiabatic driving and inverse engineering, to speed up the direct (long range) transfer between edge dots in triple QDs. Then, we extend our analysis to a multi-dot system. We show how by implementing the modified pulses, fast adiabatic-like charge transport between the outer dots can be eventually achieved without populating intermediate dots. We discuss as well the dependence of the transfer fidelity on the operation time in the presence of dephasing. The proposed protocols for accelerating adiabatic charge transfer directly between the outer dots in a QD array offers a robust mechanism for quantum information processing, by minimizing decoherence and relaxation processes.
  • Mobile robot navigation in complex and dynamic environments is a challenging but important problem. Reinforcement learning approaches fail to solve these tasks efficiently due to reward sparsities, temporal complexities and high-dimensionality of sensorimotor spaces which are inherent in such problems. We present a novel approach to train action policies to acquire navigation skills for wheel-legged robots using deep reinforcement learning. The policy maps height-map image observations to motor commands to navigate to a target position while avoiding obstacles. We propose to acquire the multifaceted navigation skill by learning and exploiting a number of manageable navigation behaviors. We also introduce a domain randomization technique to improve the versatility of the training samples. We demonstrate experimentally a significant improvement in terms of data-efficiency, success rate, robustness against irrelevant sensory data, and also the quality of the maneuver skills.
  • With the ever growing diversity of devices and applications that will be connected to 5G networks, flexible and agile service orchestration with acknowledged QoE that satisfies end-user's functional and QoS requirements is necessary. SDN (Software-Defined Networking) and NFV (Network Function Virtualization) are considered key enabling technologies for 5G core networks. In this regard, this paper proposes a reinforcement learning based QoS/QoE-aware Service Function Chaining (SFC) in SDN/NFV-enabled 5G slices. First, it implements a lightweight QoS information collector based on LLDP, which works in a piggyback fashion on the southbound interface of the SDN controller, to enable QoS-awareness. Then, a DQN (Deep Q Network) based agent framework is designed to support SFC in the context of NFV. The agent takes into account the QoE and QoS as key aspects to formulate the reward so that it is expected to maximize QoE while respecting QoS constraints. The experiment results show that this framework exhibits good performance in QoE provisioning and QoS requirements maintenance for SFC in dynamic network environments.
  • Slot filling is a critical task in natural language understanding (NLU) for dialog systems. State-of-the-art solutions regard it as a sequence label- ing task and adopt BiLSTM-CRF models. While BiLSTM-CRF models works relatively well on standard datasets it faces challenges in Chinese E-commerce slot filling due to more informative slot labels and richer expressions. In this paper, we propose a deep multi-task learning model with cascade and residual connections. Experimental results show that our framework not only achieves competitive performance with state-of-the-arts on a standard dataset, but also significantly outperforms strong baselines by a substantial gain of 14.6% on a Chinese E-commerce dataset.
  • In recent years, two-dimensional confined catalysis, i.e. the enhanced catalytic reactions in confined spaces between metal surface and two-dimensional overlayer, makes a hit and opens up a new way to enhance the performance of catalysts. In this work, graphdiyne overlayer was proposed as a more excellent material than graphene or hexagonal boron nitride for two-dimensional confined catalysis. Density functional theory calculations revealed the superiority of graphdiyne overlayer originated from the steric hindrance effect which increases the catalytic ability and lowers the reaction barriers. Moreover, with the big triangle holes as natural gas tunnels, graphdiyne possesses higher efficiency for the transit of gaseous reactants and products than graphene or hexagonal boron nitride. The results in this work would benefit future development two-dimensional confined catalysis.
  • Recently, two-dimensional materials and nanoparticles with robust ferromagnetism are even of great interest to explore basic physics in nanoscale spintronics. More importantly, room-temperature magnetic semiconducting materials with high Curie temperature is essential for developing next-generation spintronic and quantum computing devices. Here, we develop a theoretical model on the basis of density functional theory calculations and the Ruderman-Kittel-Kasuya-Yoshida theory to predict the thermal stability of two-dimensional magnetic materials. Compared with other rare-earth (dysprosium (Dy) and erbium (Er)) and 3d (copper (Cu)) impurities, holmium-doped (Ho-doped) single-layer 1H-MoS2 is proposed as promising semiconductor with robust magnetism. The calculations at the level of hybrid HSE06 functional predict a Curie temperature much higher than room temperature. Ho-doped MoS2 sheet possesses fully spin-polarized valence and conduction bands, which is a prerequisite for flexible spintronic applications.
  • In this paper, we study the problem of image-text matching. Inferring the latent semantic alignment between objects or other salient stuffs (e.g. snow, sky, lawn) and the corresponding words in sentences allows to capture fine-grained interplay between vision and language, and makes image-text matching more interpretable. Prior works either simply aggregate the similarity of all possible pairs of regions and words without attending differentially to more and less important words or regions, or use a multi-step attentional process to capture limited number of semantic alignments which is less interpretable. In this paper, we present Stacked Cross Attention to discover the full latent alignments using both image regions and words in sentence as context and infer the image-text similarity. Our approach achieves the state-of-the-art results on the MS-COCO and Flickr30K datasets. On Flickr30K, our approach outperforms the current best methods by 22.1% in text retrieval from image query, and 18.2% in image retrieval with text query (based on Recall@1). On MS-COCO, our approach improves sentence retrieval by 17.8% and image retrieval by 16.6% (based on Recall@1 using the 5K test set).
  • In real-world Bayesian inference applications, prior assumptions regarding the parameters of interest may be unrepresentative of their actual values for a given dataset. In particular, if the likelihood is concentrated far out in the wings of the assumed prior distribution, this can lead to extremely inefficient exploration of the resulting posterior by nested sampling algorithms, with unnecessarily high associated computational costs. Simple solutions such as broadening the prior range in such cases might not be appropriate or possible in real-world applications, for example when one wishes to assume a single standardised prior across the analysis of a large number of datasets for which the true values of the parameters of interest may vary. This work therefore introduces a posterior repartitioning (PR) method for nested sampling algorithms, which addresses the problem by redefining the likelihood and prior while keeping their product fixed, so that the posterior inferences remain unchanged but the efficiency of the nested sampling process is significantly increased. Numerical results show that the PR method provides a simple yet powerful refinement for nested sampling algorithms to address the issue of unrepresentative priors.
  • With the increased complexity of power systems due to the integration of smart grid technologies and renewable energy resources, more frequent changes have been introduced to system status, and the traditional serial mode of state estimation algorithm cannot well meet the restrict time-constrained requirement for the future dynamic power grid, even with advanced computer hardware. To guarantee the grid reliability and minimize the impacts caused by system status fluctuations, a fast, even SCADA-rate, state estimator is urgently needed. In this paper, a graph based power system modeling is firstly explored and a graph computing based state estimation is proposed to speed up its performance. The power system is represented by a graph, which is a collection of vertices and edges, and the measurements are attributes of vertices and edges. Each vertex can independently implement local computation, like formulations of the node-based H matrix, gain matrix and righthand-side (RHS) vector, only with the information on its connected edges and neighboring vertices. Then, by taking advantages of graph database, these node-based data are conveniently collected and stored in the compressed sparse row (CSR) format avoiding the complexity and heaviness introduced by the sparse matrices. With communications and synchronization, centralized computation of solving the weighted least square (WLS) state estimation is completed with hierarchical parallel computing. The proposed strategy is implemented on a graph database platform. The testing results of IEEE 14-bus, IEEE 118-bus systems and a provincial system in China verify the accuracy and high-performance of the proposed methodology.
  • We present a new analysis of exchange and dispersion effects for calculating halogen-bonding interactions in a wide variety of complex dimers (69 total) within the XB18 and XB51 benchmark sets. Contrary to previous work on these systems, we find that dispersion plays a more significant role than exact exchange in accurately calculating halogen-bonding interaction energies, which are further confirmed by extensive SAPT analyses. In particular, we find that even if the amount of exact exchange is non-empirically tuned to satisfy known DFT constraints, we still observe an overall improvement in predicting dissociation energies when dispersion corrections are applied, in stark contrast to previous studies (J. Chem. Theory Comput. 2013, 9, 1918-1931). In addition to these new analyses, we correct several (14) inconsistencies in the XB51 set, which is widely used in the scientific literature for developing and benchmarking various DFT methods. Together, these new analyses and revised benchmarks emphasize the importance of dispersion and provide corrected reference values that are essential for developing/parameterizing new DFT functionals specifically for complex halogen-bonding interactions.
  • Imitation learning is a powerful paradigm for robot skill acquisition. However, obtaining demonstrations suitable for learning a policy that maps from raw pixels to actions can be challenging. In this paper we describe how consumer-grade Virtual Reality headsets and hand tracking hardware can be used to naturally teleoperate robots to perform complex tasks. We also describe how imitation learning can learn deep neural network policies (mapping from pixels to actions) that can acquire the demonstrated skills. Our experiments showcase the effectiveness of our approach for learning visuomotor skills.
  • We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call `Krazy World' and a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important.
  • Deep neural networks excel in regimes with large amounts of data, but tend to struggle when data is scarce or when they need to adapt quickly to changes in the task. In response, recent work in meta-learning proposes training a meta-learner on a distribution of similar tasks, in the hopes of generalization to novel but related tasks by learning a high-level strategy that captures the essence of the problem it is asked to solve. However, many recent meta-learning approaches are extensively hand-designed, either using architectures specialized to a particular application, or hard-coding algorithmic components that constrain how the meta-learner solves the task. We propose a class of simple and generic meta-learner architectures that use a novel combination of temporal convolutions and soft attention; the former to aggregate information from past experience and the latter to pinpoint specific pieces of information. In the most extensive set of meta-learning experiments to date, we evaluate the resulting Simple Neural AttentIve Learner (or SNAIL) on several heavily-benchmarked tasks. On all tasks, in both supervised and reinforcement learning, SNAIL attains state-of-the-art performance by significant margins.
  • We present new criteria for selecting HII regions from the Infrared Astronomical Satellite (IRAS) Point Source catalogue (PSC), based on an HII region catalogue derived manually from the all-sky Wide-field Infrared Survey Explorer (WISE). The criteria are used to augment the number of HII region candidates in the Milky Way. The criteria are defined by the linear decision boundary of two samples: IRAS point sources associated with known HII regions, which serve as the HII region sample, and IRAS point sources at high Galactic latitudes, which serve as the non-HII region sample. A machine learning classifier, specifically a support vector machine (SVM), is used to determine the decision boundary. We investigate all combinations of four IRAS bands and suggest that the optimal criterion is log(F$_{\rm 60}$/F$_{\rm 12}$)$\ge$(-0.19$\times$log(F$_{\rm 100}$/F$_{\rm 25}$)+ 1.52), with detections at 60 and 100 micron. This selects 3041 HII region candidates from the IRAS PSC. We find that IRAS HII region candidates show evidence of evolution on the two-colour diagram. Merging the WISE HII catalogue with IRAS HII region candidates, we estimate a lower limit of approximately 10200 for the number of HII regions in the Milky Way.
  • In this paper, we introduce a web-scale general visual search system deployed in Microsoft Bing. The system accommodates tens of billions of images in the index, with thousands of features for each image, and can respond in less than 200 ms. In order to overcome the challenges in relevance, latency, and scalability in such large scale of data, we employ a cascaded learning-to-rank framework based on various latest deep learning visual features, and deploy in a distributed heterogeneous computing platform. Quantitative and qualitative experiments show that our system is able to support various applications on Bing website and apps.
  • In this paper, we consider an unconstrained optimization model where the objective is a sum of a large number of possibly nonconvex functions, though overall the objective is assumed to be smooth and convex. Our bid to solving such model uses the framework of cubic regularization of Newton's method.As well known, the crux in cubic regularization is its utilization of the Hessian information, which may be computationally expensive for large-scale problems. To tackle this, we resort to approximating the Hessian matrix via sub-sampling. In particular, we propose to compute an approximated Hessian matrix by either uniformly or non-uniformly sub-sampling the components of the objective. Based upon sub-sampling, we develop both standard and accelerated adaptive cubic regularization approaches and provide theoretical guarantees on global iteration complexity. We show that the standard and accelerated sub-sampled cubic regularization methods achieve iteration complexity in the order of $O(\epsilon^{-1/2})$ and $O(\epsilon^{-1/3})$ respectively, which match those of the original standard and accelerated cubic regularization methods \cite{Cartis-2012-Evaluation, Jiang-2017-Unified} using the full Hessian information. The performances of the proposed methods on regularized logistic regression problems show a clear effect of acceleration in terms of epochs on several real data sets.
  • Automatic conflict detection has grown in relevance with the advent of body-worn technology, but existing metrics such as turn-taking and overlap are poor indicators of conflict in police-public interactions. Moreover, standard techniques to compute them fall short when applied to such diversified and noisy contexts. We develop a pipeline catered to this task combining adaptive noise removal, non-speech filtering and new measures of conflict based on the repetition and intensity of phrases in speech. We demonstrate the effectiveness of our approach on body-worn audio data collected by the Los Angeles Police Department.
  • We study the problem of testing whether an unknown $n$-variable Boolean function is a $k$-junta in the distribution-free property testing model, where the distance between functions is measured with respect to an arbitrary and unknown probability distribution over $\{0,1\}^n$. Our first main result is that distribution-free $k$-junta testing can be performed, with one-sided error, by an adaptive algorithm that uses $\tilde{O}(k^2)/\epsilon$ queries (independent of $n$). Complementing this, our second main result is a lower bound showing that any non-adaptive distribution-free $k$-junta testing algorithm must make $\Omega(2^{k/3})$ queries even to test to accuracy $\epsilon=1/3$. These bounds establish that while the optimal query complexity of non-adaptive $k$-junta testing is $2^{\Theta(k)}$, for adaptive testing it is $\text{poly}(k)$, and thus show that adaptivity provides an exponential improvement in the distribution-free query complexity of testing juntas.