• ### MV-RNN: A Multi-View Recurrent Neural Network for Sequential Recommendation(1611.06668)

Nov. 20, 2018 cs.IR
Sequential recommendation is a fundamental task for network applications, and it usually suffers from the item cold start problem due to the insufficiency of user feedbacks. There are currently three kinds of popular approaches which are respectively based on matrix factorization (MF) of collaborative filtering, Markov chain (MC), and recurrent neural network (RNN). Although widely used, they have some limitations. MF based methods could not capture dynamic user's interest. The strong Markov assumption greatly limits the performance of MC based methods. RNN based methods are still in the early stage of incorporating additional information. Based on these basic models, many methods with additional information only validate incorporating one modality in a separate way. In this work, to make the sequential recommendation and deal with the item cold start problem, we propose a Multi-View Recurrent Neural Network (MV-RNN}) model. Given the latent feature, MV-RNN can alleviate the item cold start problem by incorporating visual and textual information. First, At the input of MV-RNN, three different combinations of multi-view features are studied, like concatenation, fusion by addition and fusion by reconstructing the original multi-modal data. MV-RNN applies the recurrent structure to dynamically capture the user's interest. Second, we design a separate structure and a united structure on the hidden state of MV-RNN to explore a more effective way to handle multi-view features. Experiments on two real-world datasets show that MV-RNN can effectively generate the personalized ranking list, tackle the missing modalities problem and significantly alleviate the item cold start problem.
• ### Investigation of the near-threshold cluster resonance in $^{14}\rm{C}$(1804.10797)

April 28, 2018 nucl-ex
An experiment for $p(^{14}\rm{C}$,$^{14}\rm{C}^{*}\rightarrow^{10}\rm{Be}+\alpha)\mathit{p}$ inelastic excitation and decay was performed in inverse kinematics at a beam energy of 25.3 MeV/u. A series of $^{14}\rm{C}$ excited states, including a new one at 18.3(1) MeV, were observed which decay to various states of the final nucleus of $^{10}\rm{Be}$. A specially designed telescope-system, installed around the zero degree, played an essential role in detecting the resonant states near the $\alpha$-separation threshold. A state at 14.1(1) MeV is clearly identified, being consistent with the predicted band-head of the molecular rotational band characterized by the $\pi$-bond linear-chain-configuration. Further clarification of the properties of this exotic state is suggested by using appropriate reaction tools.
• ### Topological Maxwell Metal Bands in a Superconducting Qutrit(1709.05765)

We experimentally explore the topological Maxwell metal bands by mapping the momentum space of condensed-matter models to the tunable parameter space of superconducting quantum circuits. An exotic band structure that is effectively described by the spin-1 Maxwell equations is imaged. Three-fold degenerate points dubbed Maxwell points are observed in the Maxwell metal bands. Moreover, we engineer and observe the topological phase transition from the topological Maxwell metal to a trivial insulator, and report the first experiment to measure the Chern numbers that are higher than one.
• ### Stochastic Variance Reduction for Policy Gradient Estimation(1710.06034)

March 29, 2018 cs.LG, stat.ML
Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems. However, the variance of the performance gradient estimates obtained from the simulation is often excessive, leading to poor sample efficiency. In this paper, we apply the stochastic variance reduced gradient descent (SVRG) to model-free policy gradient to significantly improve the sample-efficiency. The SVRG estimation is incorporated into a trust-region Newton conjugate gradient framework for the policy optimization. On several Mujoco tasks, our method achieves significantly better performance compared to the state-of-the-art model-free policy gradient methods in robotic continuous control such as trust region policy optimization (TRPO)
• ### Learning to Explore with Meta-Policy Gradient(1803.05044)

March 26, 2018 cs.AI, cs.LG
The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy. Existing exploration methods are mostly based on adding noise to the on-going actor policy and can only explore \emph{local} regions close to what the actor policy dictates. In this work, we develop a simple meta-policy gradient algorithm that allows us to adaptively learn the exploration policy in DDPG. Our algorithm allows us to train flexible exploration behaviors that are independent of the actor policy, yielding a \emph{global exploration} that significantly speeds up the learning process. With an extensive study, we show that our method significantly improves the sample-efficiency of DDPG on a variety of reinforcement learning tasks.
• ### Emulating topological chiral magnetic effects in artificial Weyl semimetals(1802.08371)

We realized highly tunable Weyl semimetal-bands and subsequently emulated the topological chiral magnetic effects in superconducting quantum circuits. Driving the superconducting quantum circuits with elaborately designed microwave fields, we mapped the momentum space of a lattice to the parameter space, realizing the Hamiltonian of a Weyl semimetal. By measuring the energy spectrum, we directly imaged the Weyl points of cubic lattices, whose topological winding numbers were further determined from the Berry curvature measurement. In particular, we used an additional microwave field to produce a momentum-dependent chemical potential, from which the chiral magnetic topological current was extracted in the presence of an artificial magnetic field. This pure topological current is proportional to the magnetic field, which is in contrast to the famous Ampere's law, and may have significant impacts on topological materials and quantum devices.
• ### BEBP: An Poisoning Method Against Machine Learning Based IDSs(1803.03965)

March 11, 2018 cs.CR, cs.LG, stat.ML
In big data era, machine learning is one of fundamental techniques in intrusion detection systems (IDSs). However, practical IDSs generally update their decision module by feeding new data then retraining learning models in a periodical way. Hence, some attacks that comprise the data for training or testing classifiers significantly challenge the detecting capability of machine learning-based IDSs. Poisoning attack, which is one of the most recognized security threats towards machine learning-based IDSs, injects some adversarial samples into the training phase, inducing data drifting of training data and a significant performance decrease of target IDSs over testing data. In this paper, we adopt the Edge Pattern Detection (EPD) algorithm to design a novel poisoning method that attack against several machine learning algorithms used in IDSs. Specifically, we propose a boundary pattern detection algorithm to efficiently generate the points that are near to abnormal data but considered to be normal ones by current classifiers. Then, we introduce a Batch-EPD Boundary Pattern (BEBP) detection algorithm to overcome the limitation of the number of edge pattern points generated by EPD and to obtain more useful adversarial samples. Based on BEBP, we further present a moderate but effective poisoning method called chronic poisoning attack. Extensive experiments on synthetic and three real network data sets demonstrate the performance of the proposed poisoning method against several well-known machine learning algorithms and a practical intrusion detection method named FMIFS-LSSVM-IDS.
• ### On the Discrimination-Generalization Tradeoff in GANs(1711.02771)

Feb. 23, 2018 cs.LG, stat.ML
Generative adversarial training can be generally understood as minimizing certain moment matching loss defined by a set of discriminator functions, typically neural networks. The discriminator set should be large enough to be able to uniquely identify the true distribution (discriminative), and also be small enough to go beyond memorizing samples (generalizable). In this paper, we show that a discriminator set is guaranteed to be discriminative whenever its linear span is dense in the set of bounded continuous functions. This is a very mild condition satisfied even by neural networks with a single neuron. Further, we develop generalization bounds between the learned distribution and true distribution under different evaluation metrics. When evaluated with neural distance, our bounds show that generalization is guaranteed as long as the discriminator set is small enough, regardless of the size of the generator or hypothesis set. When evaluated with KL divergence, our bound provides an explanation on the counter-intuitive behaviors of testing likelihood in GAN training. Our analysis sheds lights on understanding the practical performance of GANs.
• ### Action-depedent Control Variates for Policy Optimization via Stein's Identity(1710.11198)

Feb. 23, 2018 cs.LG, stat.ML
Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from the large variance issue on policy gradient estimation, which leads to poor sample efficiency during training. In this work, we propose a control variate method to effectively reduce variance for policy gradient methods. Motivated by the Stein's identity, our method extends the previous control variate methods used in REINFORCE and advantage actor-critic by introducing more general action-dependent baseline functions. Empirical studies show that our method significantly improves the sample efficiency of the state-of-the-art policy gradient approaches.
• ### Stein Variational Message Passing for Continuous Graphical Models(1711.07168)

Feb. 18, 2018 cs.LG, stat.ML
We propose a novel distributed inference algorithm for continuous graphical models, by extending Stein variational gradient descent (SVGD) to leverage the Markov dependency structure of the distribution of interest. Our approach combines SVGD with a set of structured local kernel functions defined on the Markov blanket of each node, which alleviates the curse of high dimensionality and simultaneously yields a distributed algorithm for decentralized inference tasks. We justify our method with theoretical analysis and show that the use of local kernels can be viewed as a new type of localized approximation that matches the target distribution on the conditional distributions of each node over its Markov blanket. Our empirical results show that our method outperforms a variety of baselines including standard MCMC and particle message passing methods.
• ### Adaptive Scan Gibbs Sampler for Large Scale Inference Problems(1801.09144)

Jan. 27, 2018 stat.ML
For large scale on-line inference problems the update strategy is critical for performance. We derive an adaptive scan Gibbs sampler that optimizes the update frequency by selecting an optimum mini-batch size. We demonstrate performance of our adaptive batch-size Gibbs sampler by comparing it against the collapsed Gibbs sampler for Bayesian Lasso, Dirichlet Process Mixture Models (DPMM) and Latent Dirichlet Allocation (LDA) graphical models.
• ### Learning Infinite RBMs with Frank-Wolfe(1710.05270)

Oct. 15, 2017 cs.AI, cs.LG, stat.ML
In this work, we propose an infinite restricted Boltzmann machine~(RBM), whose maximum likelihood estimation~(MLE) corresponds to a constrained convex optimization. We consider the Frank-Wolfe algorithm to solve the program, which provides a sparse solution that can be interpreted as inserting a hidden unit at each iteration, so that the optimization process takes the form of a sequence of finite models of increasing complexity. As a side benefit, this can be used to easily and efficiently identify an appropriate number of hidden units during the optimization. The resulting model can also be used as an initialization for typical state-of-the-art RBM training algorithms such as contrastive divergence, leading to models with consistently higher test likelihood than random initialization.
• ### Energy-efficient Amortized Inference with Cascaded Deep Classifiers(1710.03368)

Oct. 10, 2017 cs.LG
Deep neural networks have been remarkable successful in various AI tasks but often cast high computation and energy cost for energy-constrained applications such as mobile sensing. We address this problem by proposing a novel framework that optimizes the prediction accuracy and energy cost simultaneously, thus enabling effective cost-accuracy trade-off at test time. In our framework, each data instance is pushed into a cascade of deep neural networks with increasing sizes, and a selection module is used to sequentially determine when a sufficiently accurate classifier can be used for this data instance. The cascade of neural networks and the selection module are jointly trained in an end-to-end fashion by the REINFORCE algorithm to optimize a trade-off between the computational cost and the predictive accuracy. Our method is able to simultaneously improve the accuracy and efficiency by learning to assign easy instances to fast yet sufficiently accurate classifiers to save computation and energy cost, while assigning harder instances to deeper and more powerful classifiers to ensure satisfiable accuracy. With extensive experiments on several image classification datasets using cascaded ResNet classifiers, we demonstrate that our method outperforms the standard well-trained ResNets in accuracy but only requires less than 20% and 50% FLOPs cost on the CIFAR-10/100 datasets and 66% on the ImageNet dataset, respectively.
• ### Quantum simulation of general semi-classical Rabi model beyond strong driving regime(1709.10201)

Sept. 28, 2017 quant-ph
We propose a scheme to simulate the interaction between a two-level system and a classical light field. Under the transversal driving of two microwave tones, the system Hamiltonian is identical to that of the general semi-classical Rabi model. We experimentally realize this Hamiltonian with a superconducting transmon qubit. By tuning the strength, phase and frequency of the two microwave driving fields, we simulate the quantum dynamics from weak to extremely strong driving regime. The resulting evolutions gradually deviate from the normal sinusoidal Rabi oscillations with increasing driving strength, in accordance with the predictions of the general semi-classical Rabi model far beyond the weak driving limit. Our scheme provides an effective approach to investigate the extremely strong interaction between a two-level system and a classical light field. Such strong interactions are usually inaccessible in experiments.
• ### Leveraging local h-index to identify and rank influential spreaders in networks(1708.09532)

Sept. 15, 2017 physics.soc-ph, cs.SI
Identifying influential nodes in complex networks has received increasing attention for its great theoretical and practical applications in many fields. Traditional methods, such as degree centrality, betweenness centrality, closeness centrality, and coreness centrality, have more or less disadvantages in detecting influential nodes, which have been illustrated in related literatures. Recently, the h-index, which is utilized to measure both the productivity and citation impact of the publications of a scientist or scholar, has been introduced to the network world to evaluate a node's spreading ability. However, this method assigns too many nodes with the same value, which leads to a resolution limit problem in distinguishing the real influence of these nodes. In this paper, we propose a local h-index centrality (LH-index) method for identifying and ranking influential nodes in networks. The LH-index method simultaneously takes into account of h-index values of the node itself and its neighbors, which is based on the idea that a node connects to more influential nodes will also be influential. According to the simulation results with the stochastic Susceptible-Infected-Recovered (SIR) model in four real world networks and several simulated networks, we demonstrate the effectivity of the LH-index method in identifying influential nodes in networks.
• ### Stein Variational Adaptive Importance Sampling(1704.05201)

July 25, 2017 stat.ML
We propose a novel adaptive importance sampling algorithm which incorporates Stein variational gradient decent algorithm (SVGD) with importance sampling (IS). Our algorithm leverages the nonparametric transforms in SVGD to iteratively decrease the KL divergence between our importance proposal and the target distribution. The advantages of this algorithm are twofold: first, our algorithm turns SVGD into a standard IS algorithm, allowing us to use standard diagnostic and analytic tools of IS to evaluate and interpret the results; second, we do not restrict the choice of our importance proposal to predefined distribution families like traditional (adaptive) IS methods. Empirical experiments demonstrate that our algorithm performs well on evaluating partition functions of restricted Boltzmann machines and testing likelihood of variational auto-encoders.
• ### Learning to Draw Samples with Amortized Stein Variational Gradient Descent(1707.06626)

July 20, 2017 stat.ML
We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference. Our method is based on iteratively adjusting the neural network parameters so that the output changes along a Stein variational gradient direction (Liu & Wang, 2016) that maximally decreases the KL divergence with the target distribution. Our method works for any target distribution specified by their unnormalized density function, and can train any black-box architectures that are differentiable in terms of the parameters we want to adapt. We demonstrate our method with a number of applications, including variational autoencoder (VAE) with expressive encoders to model complex latent space structures, and hyper-parameter learning of MCMC samplers that allows Bayesian inference to adaptively improve itself when seeing more data.
• ### Learning Deep Energy Models: Contrastive Divergence vs. Amortized MLE(1707.00797)

July 4, 2017 cs.LG, stat.ML
We propose a number of new algorithms for learning deep energy models and demonstrate their properties. We show that our SteinCD performs well in term of test likelihood, while SteinGAN performs well in terms of generating realistic looking images. Our results suggest promising directions for learning better models by combining GAN-style methods with traditional energy-based learning.
• ### Mining Significant Microblogs for Misinformation Identification: An Attention-based Approach(1706.06314)

June 20, 2017 cs.SI, cs.IR
With the rapid growth of social media, massive misinformation is also spreading widely on social media, such as microblog, and bring negative effects to human life. Nowadays, automatic misinformation identification has drawn attention from academic and industrial communities. For an event on social media usually consists of multiple microblogs, current methods are mainly based on global statistical features. However, information on social media is full of noisy and outliers, which should be alleviated. Moreover, most of microblogs about an event have little contribution to the identification of misinformation, where useful information can be easily overwhelmed by useless information. Thus, it is important to mine significant microblogs for a reliable misinformation identification method. In this paper, we propose an Attention-based approach for Identification of Misinformation (AIM). Based on the attention mechanism, AIM can select microblogs with largest attention values for misinformation identification. The attention mechanism in AIM contains two parts: content attention and dynamic attention. Content attention is calculated based textual features of each microblog. Dynamic attention is related to the time interval between the posting time of a microblog and the beginning of the event. To evaluate AIM, we conduct a series of experiments on the Weibo dataset and the Twitter dataset, and the experimental results show that the proposed AIM model outperforms the state-of-the-art methods.
• ### Approximate Inference with Amortised MCMC(1702.08343)

May 22, 2017 cs.LG, stat.ML
We propose a novel approximate inference algorithm that approximates a target distribution by amortising the dynamics of a user-selected MCMC sampler. The idea is to initialise MCMC using samples from an approximation network, apply the MCMC operator to improve these samples, and finally use the samples to update the approximation network thereby improving its quality. This provides a new generic framework for approximate inference, allowing us to deploy highly complex, or implicitly defined approximation families with intractable densities, including approximations produced by warping a source of randomness through a deep neural network. Experiments consider image modelling with deep generative models as a challenging test for the method. Deep models trained using amortised MCMC are shown to generate realistic looking samples as well as producing diverse imputations for images with regions of missing pixels.
• ### Extensible 3D architecture for superconducting quantum computing(1705.02586)

May 7, 2017 quant-ph
Using a multi-layered printed circuit board, we propose a 3D architecture suitable for packaging supercon- ducting chips, especially chips that contain two-dimensional qubit arrays. In our proposed architecture, the center strips of the buried coplanar waveguides protrude from the surface of a dielectric layer as contacts. Since the contacts extend beyond the surface of the dielectric layer, chips can simply be flip-chip packaged with on-chip receptacles clinging to the contacts. Using this scheme, we packaged a multi-qubit chip and per- formed single-qubit and two-qubit quantum gate operations. The results indicate that this 3D architecture provides a promising scheme for scalable quantum computing.

April 25, 2017 stat.ML
Stein variational gradient descent (SVGD) is a deterministic sampling algorithm that iteratively transports a set of particles to approximate given distributions, based on an efficient gradient-based update that guarantees to optimally decrease the KL divergence within a function space. This paper develops the first theoretical analysis on SVGD, discussing its weak convergence properties and showing that its asymptotic behavior is captured by a gradient flow of the KL divergence functional under a new metric structure induced by Stein operator. We also provide a number of results on Stein operator and Stein's identity using the notion of weak derivative, including a new proof of the distinguishability of Stein discrepancy under weak conditions.
• ### Stein Variational Policy Gradient(1704.02399)

April 7, 2017 cs.LG
Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian inference problem. We then propose a novel Stein variational policy gradient method (SVPG) which combines existing policy gradient methods and a repulsive functional to generate a set of diverse but well-behaved policies. SVPG is robust to initialization and can easily be implemented in a parallel manner. On continuous control problems, we find that implementing SVPG on top of REINFORCE and advantage actor-critic algorithms improves both average return and data efficiency.
• ### Energy Efficient Joint Resource Allocation and Power Control for D2D Communications(1703.07041)

March 21, 2017 cs.IT, math.IT
In this paper, joint resource allocation and power control for energy efficient device-to-device (D2D) communications underlaying cellular networks are investigated. The resource and power are optimized for maximization of the energy efficiency (EE) of D2D communications. Exploiting the properties of fractional programming, we transform the original nonconvex optimization problem in fractional form into an equivalent optimization problem in subtractive form. Then, an efficient iterative resource allocation and power control scheme is proposed. In each iteration, part of the constraints of the EE optimization problem is removed by exploiting the penalty function approach. We further propose a novel two-layer approach which allows to find the optimum at each iteration by decoupling the EE optimization problem of joint resource allocation and power control into two separate steps. In the first layer, the optimal power values are obtained by solving a series of maximization problems through root-finding with or without considering the loss of cellular users' rates. In the second layer, the formulated optimization problem belongs to a classical resource allocation problem with single allocation format which admits a network flow formulation so that it can be solved to optimality. Simulation results demonstrate the remarkable improvements in terms of EE by using the proposed iterative resource allocation and power control scheme.
• ### Bootstrap Model Aggregation for Distributed Statistical Learning(1607.01036)

Feb. 27, 2017 cs.AI, cs.LG, stat.ML
In distributed, or privacy-preserving learning, we are often given a set of probabilistic models estimated from different local repositories, and asked to combine them into a single model that gives efficient statistical estimation. A simple method is to linearly average the parameters of the local models, which, however, tends to be degenerate or not applicable on non-convex models, or models with different parameter dimensions. One more practical strategy is to generate bootstrap samples from the local models, and then learn a joint model based on the combined bootstrap set. Unfortunately, the bootstrap procedure introduces additional noise and can significantly deteriorate the performance. In this work, we propose two variance reduction methods to correct the bootstrap noise, including a weighted M-estimator that is both statistically efficient and practically powerful. Both theoretical and empirical analysis is provided to demonstrate our methods.