• Most recent approaches use the sequence-to-sequence model for paraphrase generation. The existing sequence-to-sequence model tends to memorize the words and the patterns in the training dataset instead of learning the meaning of the words. Therefore, the generated sentences are often grammatically correct but semantically improper. In this work, we introduce a novel model based on the encoder-decoder framework, called Word Embedding Attention Network (WEAN). Our proposed model generates the words by querying distributed word representations (i.e. neural word embeddings), hoping to capturing the meaning of the according words. Following previous work, we evaluate our model on two paraphrase-oriented tasks, namely text simplification and short text abstractive summarization. Experimental results show that our model outperforms the sequence-to-sequence baseline by the BLEU score of 6.3 and 5.5 on two English text simplification datasets, and the ROUGE-2 F1 score of 5.7 on a Chinese summarization dataset. Moreover, our model achieves state-of-the-art performances on these three benchmark datasets.
  • To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embedding-based and path-based methods for knowledge-graph-aware recommendation, we propose Ripple Network, an end-to-end framework that naturally incorporates the knowledge graph into recommender systems. Similar to actual ripples propagating on the surface of water, Ripple Network stimulates the propagation of user preferences over the set of knowledge entities by automatically and iteratively extending a user's potential interests along links in the knowledge graph. The multiple "ripples" activated by a user's historically clicked items are thus superposed to form the preference distribution of the user with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments on real-world datasets, we demonstrate that Ripple Network achieves substantial gains in a variety of scenarios, including movie, book and news recommendation, over several state-of-the-art baselines.
  • We consider a distributed resource allocation problem in a multicarrier multi-user MIMO network where multiple transmitter-receiver links interfere among each other. Each user aims to maximize its own energy efficiency by adjusting its signal covariance matrix under a predefined power constraint. This problem has been addressed recently by applying a matrix exponential learning (MXL) algorithm which has a very appealing convergence rate. In this learning algorithm, however, each transmitter must know an estimate of the gradient matrix of the user utility. The knowledge of the gradient matrix at the transmitters incurs a high signaling overhead especially that this matrix size increases with the number of antennas and subcarriers. In this paper, we therefore investigate two strategies in order to decrease the informational exchange per iteration of the algorithm. In the first strategy, each user sends at each iteration part of the elements of the gradient matrix with respect to a certain probability. In the second strategy, each user feeds back sporadically the whole gradient matrix. We focus on the analysis of the convergence of the MXL algorithm to Nash Equilibrium (NE) under these two strategies. Upper bounds of the average convergence rate are obtained in both situations with general step-size setting, from which we can clearly see the impact of the incompleteness of the feedback information. We prove that the algorithm can still converge to NE and the convergence rate are not seriously affected. Simulation results further corroborate our claim and show that, in terms of convergence rate, MXL performs better under the second proposed strategy.
  • We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects. The language is human-written and less noisy. The dialogues in the dataset reflect our daily communication way and cover various topics about our daily life. We also manually label the developed dataset with communication intention and emotion information. Then, we evaluate existing approaches on DailyDialog dataset and hope it benefit the research field of dialog systems.
  • Deep latent variable models have been shown to facilitate the response generation for open-domain dialog systems. However, these latent variables are highly randomized, leading to uncontrollable generated responses. In this paper, we propose a framework allowing conditional response generation based on specific attributes. These attributes can be either manually assigned or automatically detected. Moreover, the dialog states for both speakers are modeled separately in order to reflect personal features. We validate this framework on two different scenarios, where the attribute refers to genericness and sentiment states respectively. The experiment result testified the potential of our model, where meaningful responses can be generated in accordance with the specified attributes.
  • Current Chinese social media text summarization models are based on an encoder-decoder framework. Although its generated summaries are similar to source texts literally, they have low semantic relevance. In this work, our goal is to improve semantic relevance between source texts and summaries for Chinese social media summarization. We introduce a Semantic Relevance Based neural model to encourage high semantic similarity between texts and summaries. In our model, the source text is represented by a gated attention encoder, while the summary representation is produced by a decoder. Besides, the similarity score between the representations is maximized during training. Our experiments show that the proposed model outperforms baseline systems on a social media corpus.
  • Although Generative Adversarial Networks achieve state-of-the-art results on a variety of generative tasks, they are regarded as highly unstable and prone to miss modes. We argue that these bad behaviors of GANs are due to the very particular functional shape of the trained discriminators in high dimensional spaces, which can easily make training stuck or push probability mass in the wrong direction, towards that of higher concentration than that of the data generating distribution. We introduce several ways of regularizing the objective, which can dramatically stabilize the training of GAN models. We also show that our regularizers can help the fair distribution of probability mass across the modes of the data generating distribution, during the early phases of training and thus providing a unified solution to the missing modes problem.
  • Despite the successes in capturing continuous distributions, the application of generative adversarial networks (GANs) to discrete settings, like natural language tasks, is rather restricted. The fundamental reason is the difficulty of back-propagation through discrete random variables combined with the inherent instability of the GAN training objective. To address these problems, we propose Maximum-Likelihood Augmented Discrete Generative Adversarial Networks. Instead of directly optimizing the GAN objective, we derive a novel and low-variance objective using the discriminator's output that follows corresponds to the log-likelihood. Compared with the original, the new objective is proved to be consistent in theory and beneficial in practice. The experimental results on various discrete datasets demonstrate the effectiveness of the proposed approach.
  • Many natural language generation tasks, such as abstractive summarization and text simplification, are paraphrase-orientated. In these tasks, copying and rewriting are two main writing modes. Most previous sequence-to-sequence (Seq2Seq) models use a single decoder and neglect this fact. In this paper, we develop a novel Seq2Seq model to fuse a copying decoder and a restricted generative decoder. The copying decoder finds the position to be copied based on a typical attention model. The generative decoder produces words limited in the source-specific vocabulary. To combine the two decoders and determine the final output, we develop a predictor to predict the mode of copying or rewriting. This predictor can be guided by the actual writing mode in the training data. We conduct extensive experiments on two different paraphrase datasets. The result shows that our model outperforms the state-of-the-art approaches in terms of both informativeness and language quality.
  • Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents. Text classification just makes up for these deficiencies. In this paper, we propose a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization. TCSum projects documents onto distributed representations which act as a bridge between text classification and summarization. It also utilizes the classification results to produce summaries of different styles. Extensive experiments on DUC generic multi-document summarization datasets show that, TCSum can achieve the state-of-the-art performance without using any hand-crafted features and has the capability to catch the variations of summary styles with respect to different text categories.
  • Query relevance ranking and sentence saliency ranking are the two main tasks in extractive query-focused summarization. Previous supervised summarization systems often perform the two tasks in isolation. However, since reference summaries are the trade-off between relevance and saliency, using them as supervision, neither of the two rankers could be trained well. This paper proposes a novel summarization system called AttSum, which tackles the two tasks jointly. It automatically learns distributed representations for sentences as well as the document cluster. Meanwhile, it applies the attention mechanism to simulate the attentive reading of human behavior when a query is given. Extensive experiments are conducted on DUC query-focused summarization benchmark datasets. Without using any hand-crafted features, AttSum achieves competitive performance. It is also observed that the sentences recognized to focus on the query indeed meet the query need.
  • Introducing an isolated intermediate band (IB) into a wide band gap semiconductor can potentially improve the optical absorption of the material beyond the Shockley-Queisser limitation for solar cells. Here, we present a systematic study of the thermodynamic stability, electronic structures, and optical properties of transition metals (M = Ti, V, and Fe) doped CuAlSe2 for potential IB thin film solar cells, by adopting the first-principles calculation based on the hybrid functional method. We found from chemical potential analysis that for all dopants considered, the stable doped phase only exits when the Al atom is substituted. More importantly, with this substitution, the IB feature is determined by $3d$ electronic nature of M^{3+} ion, and the electronic configuration of 3d^1 can drive a optimum IB that possesses half-filled character and suitable subbandgap from valence band or conduction band. We further show that Ti-doped CuAlSe2 is the more promising candidate for IB materials since the resulted IB in it is half filled and extra absorption peaks occurs in the optical spectrum accompanied with a largely enhanced light absorption intensity. The result offers a understanding for IB induced by transition metals into CuAlSe2 and is significant to fabricate the related IB materials.
  • The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media's reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research.
  • Distributed word representations are very useful for capturing semantic information and have been successfully applied in a variety of NLP tasks, especially on English. In this work, we innovatively develop two component-enhanced Chinese character embedding models and their bigram extensions. Distinguished from English word embeddings, our models explore the compositions of Chinese characters, which often serve as semantic indictors inherently. The evaluations on both word similarity and text classification demonstrate the effectiveness of our models.
  • Typical dimensionality reduction (DR) methods are often data-oriented, focusing on directly reducing the number of random variables (features) while retaining the maximal variations in the high-dimensional data. In unsupervised situations, one of the main limitations of these methods lies in their dependency on the scale of data features. This paper aims to address the problem from a new perspective and considers model-oriented dimensionality reduction in parameter spaces of binary multivariate distributions. Specifically, we propose a general parameter reduction criterion, called Confident-Information-First (CIF) principle, to maximally preserve confident parameters and rule out less confident parameters. Formally, the confidence of each parameter can be assessed by its contribution to the expected Fisher information distance within the geometric manifold over the neighbourhood of the underlying real distribution. We then revisit Boltzmann machines (BM) from a model selection perspective and theoretically show that both the fully visible BM (VBM) and the BM with hidden units can be derived from the general binary multivariate distribution using the CIF principle. This can help us uncover and formalize the essential parts of the target density that BM aims to capture and the non-essential parts that BM should discard. Guided by the theoretical analysis, we develop a sample-specific CIF for model selection of BM that is adaptive to the observed samples. The method is studied in a series of density estimation experiments and has been shown effective in terms of the estimate accuracy.
  • This paper addresses the topic of robust Bayesian compressed sensing over finite fields. For stationary and ergodic sources, it provides asymptotic (with the size of the vector to estimate) necessary and sufficient conditions on the number of required measurements to achieve vanishing reconstruction error, in presence of sensing and communication noise. In all considered cases, the necessary and sufficient conditions asymptotically coincide. Conditions on the sparsity of the sensing matrix are established in presence of communication noise. Several previously published results are generalized and extended.
  • Typical dimensionality reduction methods focus on directly reducing the number of random variables while retaining maximal variations in the data. In this paper, we consider the dimensionality reduction in parameter spaces of binary multivariate distributions. We propose a general Confident-Information-First (CIF) principle to maximally preserve parameters with confident estimates and rule out unreliable or noisy parameters. Formally, the confidence of a parameter can be assessed by its Fisher information, which establishes a connection with the inverse variance of any unbiased estimate for the parameter via the Cram\'{e}r-Rao bound. We then revisit Boltzmann machines (BM) and theoretically show that both single-layer BM without hidden units (SBM) and restricted BM (RBM) can be solidly derived using the CIF principle. This can not only help us uncover and formalize the essential parts of the target density that SBM and RBM capture, but also suggest that the deep neural network consisting of several layers of RBM can be seen as the layer-wise application of CIF. Guided by the theoretical analysis, we develop a sample-specific CIF-based contrastive divergence (CD-CIF) algorithm for SBM and a CIF-based iterative projection procedure (IP) for RBM. Both CD-CIF and IP are studied in a series of density estimation experiments.
  • We show that thermally evaporated lead (Pb) preserves the electronic properties of organic monolayers on Si and the surface passivation of the Si surface itself. The obtained current-voltage characteristics are in accordance with results from the well-established hanging mercury drop method and preserve both the molecule-induced dipolar effect and length-attenuation of current. We rationalize our findings by the lack of interaction between the Pb and the Si substrate. Our method is fast, scalable, compatible to standard semiconductor processing, and can help to spur the large-scale utilization of silicon-organic hybrid electronics.
  • The mechanisms of solid-state electron transport (ETp) via a monolayer of immobilized Azurin (Az) was examined by conducting probe atomic force microscopy (CP-AFM), both as function of temperature (248 - 373K) and of applied tip force (6-12 nN). By varying both temperature and force in CP-AFM, we find that the ETp mechanism can alter with a change in the force applied via the tip to the proteins. As the applied force increases, ETp via Az changes from temperature-independent to thermally activated at high temperatures. This is in contrast to the Cu-depleted form of Az (apo-Az), where increasing the applied force causes only small quantitative effects, that fit with a decrease in electrode spacing. At low force ETp via holo-Az is temperature-independent and thermally activated via apo-Az. This observation agrees with macroscopic-scale measurements, thus confirming that the difference in ETp dependence on temperature between holo- and apo-Az is an inherent one that may reflect a difference in rigidity between the two forms. An important implication of these results, which depend on CP-AFM measurements over a significant temperature range, is that for ETp measurements on floppy systems, such as proteins, the stress applied to the sample should be kept constant or, at least controlled during measurement.
  • In density estimation task, maximum entropy model (Maxent) can effectively use reliable prior information via certain constraints, i.e., linear constraints without empirical parameters. However, reliable prior information is often insufficient, and the selection of uncertain constraints becomes necessary but poses considerable implementation complexity. Improper setting of uncertain constraints can result in overfitting or underfitting. To solve this problem, a generalization of Maxent, under Tsallis entropy framework, is proposed. The proposed method introduces a convex quadratic constraint for the correction of (expected) Tsallis entropy bias (TEB). Specifically, we demonstrate that the expected Tsallis entropy of sampling distributions is smaller than the Tsallis entropy of the underlying real distribution. This expected entropy reduction is exactly the (expected) TEB, which can be expressed by a closed-form formula and act as a consistent and unbiased correction. TEB indicates that the entropy of a specific sampling distribution should be increased accordingly. This entails a quantitative re-interpretation of the Maxent principle. By compensating TEB and meanwhile forcing the resulting distribution to be close to the sampling distribution, our generalized TEBC Maxent can be expected to alleviate the overfitting and underfitting. We also present a connection between TEB and Lidstone estimator. As a result, TEB-Lidstone estimator is developed by analytically identifying the rate of probability correction in Lidstone. Extensive empirical evaluation shows promising performance of both TEBC Maxent and TEB-Lidstone in comparison with various state-of-the-art density estimation methods.