
Most recent approaches use the sequencetosequence model for paraphrase
generation. The existing sequencetosequence model tends to memorize the words
and the patterns in the training dataset instead of learning the meaning of the
words. Therefore, the generated sentences are often grammatically correct but
semantically improper. In this work, we introduce a novel model based on the
encoderdecoder framework, called Word Embedding Attention Network (WEAN). Our
proposed model generates the words by querying distributed word representations
(i.e. neural word embeddings), hoping to capturing the meaning of the according
words. Following previous work, we evaluate our model on two
paraphraseoriented tasks, namely text simplification and short text
abstractive summarization. Experimental results show that our model outperforms
the sequencetosequence baseline by the BLEU score of 6.3 and 5.5 on two
English text simplification datasets, and the ROUGE2 F1 score of 5.7 on a
Chinese summarization dataset. Moreover, our model achieves stateoftheart
performances on these three benchmark datasets.

To address the sparsity and cold start problem of collaborative filtering,
researchers usually make use of side information, such as social networks or
item attributes, to improve recommendation performance. This paper considers
the knowledge graph as the source of side information. To address the
limitations of existing embeddingbased and pathbased methods for
knowledgegraphaware recommendation, we propose Ripple Network, an endtoend
framework that naturally incorporates the knowledge graph into recommender
systems. Similar to actual ripples propagating on the surface of water, Ripple
Network stimulates the propagation of user preferences over the set of
knowledge entities by automatically and iteratively extending a user's
potential interests along links in the knowledge graph. The multiple "ripples"
activated by a user's historically clicked items are thus superposed to form
the preference distribution of the user with respect to a candidate item, which
could be used for predicting the final clicking probability. Through extensive
experiments on realworld datasets, we demonstrate that Ripple Network achieves
substantial gains in a variety of scenarios, including movie, book and news
recommendation, over several stateoftheart baselines.

We consider a distributed resource allocation problem in a multicarrier
multiuser MIMO network where multiple transmitterreceiver links interfere
among each other. Each user aims to maximize its own energy efficiency by
adjusting its signal covariance matrix under a predefined power constraint.
This problem has been addressed recently by applying a matrix exponential
learning (MXL) algorithm which has a very appealing convergence rate. In this
learning algorithm, however, each transmitter must know an estimate of the
gradient matrix of the user utility. The knowledge of the gradient matrix at
the transmitters incurs a high signaling overhead especially that this matrix
size increases with the number of antennas and subcarriers. In this paper, we
therefore investigate two strategies in order to decrease the informational
exchange per iteration of the algorithm. In the first strategy, each user sends
at each iteration part of the elements of the gradient matrix with respect to a
certain probability. In the second strategy, each user feeds back sporadically
the whole gradient matrix. We focus on the analysis of the convergence of the
MXL algorithm to Nash Equilibrium (NE) under these two strategies. Upper bounds
of the average convergence rate are obtained in both situations with general
stepsize setting, from which we can clearly see the impact of the
incompleteness of the feedback information. We prove that the algorithm can
still converge to NE and the convergence rate are not seriously affected.
Simulation results further corroborate our claim and show that, in terms of
convergence rate, MXL performs better under the second proposed strategy.

We develop a highquality multiturn dialog dataset, DailyDialog, which is
intriguing in several aspects. The language is humanwritten and less noisy.
The dialogues in the dataset reflect our daily communication way and cover
various topics about our daily life. We also manually label the developed
dataset with communication intention and emotion information. Then, we evaluate
existing approaches on DailyDialog dataset and hope it benefit the research
field of dialog systems.

Deep latent variable models have been shown to facilitate the response
generation for opendomain dialog systems. However, these latent variables are
highly randomized, leading to uncontrollable generated responses. In this
paper, we propose a framework allowing conditional response generation based on
specific attributes. These attributes can be either manually assigned or
automatically detected. Moreover, the dialog states for both speakers are
modeled separately in order to reflect personal features. We validate this
framework on two different scenarios, where the attribute refers to genericness
and sentiment states respectively. The experiment result testified the
potential of our model, where meaningful responses can be generated in
accordance with the specified attributes.

Current Chinese social media text summarization models are based on an
encoderdecoder framework. Although its generated summaries are similar to
source texts literally, they have low semantic relevance. In this work, our
goal is to improve semantic relevance between source texts and summaries for
Chinese social media summarization. We introduce a Semantic Relevance Based
neural model to encourage high semantic similarity between texts and summaries.
In our model, the source text is represented by a gated attention encoder,
while the summary representation is produced by a decoder. Besides, the
similarity score between the representations is maximized during training. Our
experiments show that the proposed model outperforms baseline systems on a
social media corpus.

Although Generative Adversarial Networks achieve stateoftheart results on
a variety of generative tasks, they are regarded as highly unstable and prone
to miss modes. We argue that these bad behaviors of GANs are due to the very
particular functional shape of the trained discriminators in high dimensional
spaces, which can easily make training stuck or push probability mass in the
wrong direction, towards that of higher concentration than that of the data
generating distribution. We introduce several ways of regularizing the
objective, which can dramatically stabilize the training of GAN models. We also
show that our regularizers can help the fair distribution of probability mass
across the modes of the data generating distribution, during the early phases
of training and thus providing a unified solution to the missing modes problem.

Despite the successes in capturing continuous distributions, the application
of generative adversarial networks (GANs) to discrete settings, like natural
language tasks, is rather restricted. The fundamental reason is the difficulty
of backpropagation through discrete random variables combined with the
inherent instability of the GAN training objective. To address these problems,
we propose MaximumLikelihood Augmented Discrete Generative Adversarial
Networks. Instead of directly optimizing the GAN objective, we derive a novel
and lowvariance objective using the discriminator's output that follows
corresponds to the loglikelihood. Compared with the original, the new
objective is proved to be consistent in theory and beneficial in practice. The
experimental results on various discrete datasets demonstrate the effectiveness
of the proposed approach.

Many natural language generation tasks, such as abstractive summarization and
text simplification, are paraphraseorientated. In these tasks, copying and
rewriting are two main writing modes. Most previous sequencetosequence
(Seq2Seq) models use a single decoder and neglect this fact. In this paper, we
develop a novel Seq2Seq model to fuse a copying decoder and a restricted
generative decoder. The copying decoder finds the position to be copied based
on a typical attention model. The generative decoder produces words limited in
the sourcespecific vocabulary. To combine the two decoders and determine the
final output, we develop a predictor to predict the mode of copying or
rewriting. This predictor can be guided by the actual writing mode in the
training data. We conduct extensive experiments on two different paraphrase
datasets. The result shows that our model outperforms the stateoftheart
approaches in terms of both informativeness and language quality.

Developed so far, multidocument summarization has reached its bottleneck due
to the lack of sufficient training data and diverse categories of documents.
Text classification just makes up for these deficiencies. In this paper, we
propose a novel summarization system called TCSum, which leverages plentiful
text classification data to improve the performance of multidocument
summarization. TCSum projects documents onto distributed representations which
act as a bridge between text classification and summarization. It also utilizes
the classification results to produce summaries of different styles. Extensive
experiments on DUC generic multidocument summarization datasets show that,
TCSum can achieve the stateoftheart performance without using any
handcrafted features and has the capability to catch the variations of summary
styles with respect to different text categories.

Query relevance ranking and sentence saliency ranking are the two main tasks
in extractive queryfocused summarization. Previous supervised summarization
systems often perform the two tasks in isolation. However, since reference
summaries are the tradeoff between relevance and saliency, using them as
supervision, neither of the two rankers could be trained well. This paper
proposes a novel summarization system called AttSum, which tackles the two
tasks jointly. It automatically learns distributed representations for
sentences as well as the document cluster. Meanwhile, it applies the attention
mechanism to simulate the attentive reading of human behavior when a query is
given. Extensive experiments are conducted on DUC queryfocused summarization
benchmark datasets. Without using any handcrafted features, AttSum achieves
competitive performance. It is also observed that the sentences recognized to
focus on the query indeed meet the query need.

Introducing an isolated intermediate band (IB) into a wide band gap
semiconductor can potentially improve the optical absorption of the material
beyond the ShockleyQueisser limitation for solar cells. Here, we present a
systematic study of the thermodynamic stability, electronic structures, and
optical properties of transition metals (M = Ti, V, and Fe) doped CuAlSe2 for
potential IB thin film solar cells, by adopting the firstprinciples
calculation based on the hybrid functional method. We found from chemical
potential analysis that for all dopants considered, the stable doped phase only
exits when the Al atom is substituted. More importantly, with this
substitution, the IB feature is determined by $3d$ electronic nature of M^{3+}
ion, and the electronic configuration of 3d^1 can drive a optimum IB that
possesses halffilled character and suitable subbandgap from valence band or
conduction band. We further show that Tidoped CuAlSe2 is the more promising
candidate for IB materials since the resulted IB in it is half filled and extra
absorption peaks occurs in the optical spectrum accompanied with a largely
enhanced light absorption intensity. The result offers a understanding for IB
induced by transition metals into CuAlSe2 and is significant to fabricate the
related IB materials.

The development of summarization research has been significantly hampered by
the costly acquisition of reference summaries. This paper proposes an effective
way to automatically collect large scales of newsrelated multidocument
summaries with reference to social media's reactions. We utilize two types of
social labels in tweets, i.e., hashtags and hyperlinks. Hashtags are used to
cluster documents into different topic sets. Also, a tweet with a hyperlink
often highlights certain key points of the corresponding document. We
synthesize a linked document cluster to form a reference summary which can
cover most key points. To this aim, we adopt the ROUGE metrics to measure the
coverage ratio, and develop an Integer Linear Programming solution to discover
the sentence set reaching the upper bound of ROUGE. Since we allow summary
sentences to be selected from both documents and highquality tweets, the
generated reference summaries could be abstractive. Both informativeness and
readability of the collected summaries are verified by manual judgment. In
addition, we train a Support Vector Regression summarizer on DUC generic
multidocument summarization benchmarks. With the collected data as extra
training resource, the performance of the summarizer improves a lot on all the
test sets. We release this dataset for further research.

Distributed word representations are very useful for capturing semantic
information and have been successfully applied in a variety of NLP tasks,
especially on English. In this work, we innovatively develop two
componentenhanced Chinese character embedding models and their bigram
extensions. Distinguished from English word embeddings, our models explore the
compositions of Chinese characters, which often serve as semantic indictors
inherently. The evaluations on both word similarity and text classification
demonstrate the effectiveness of our models.

Typical dimensionality reduction (DR) methods are often dataoriented,
focusing on directly reducing the number of random variables (features) while
retaining the maximal variations in the highdimensional data. In unsupervised
situations, one of the main limitations of these methods lies in their
dependency on the scale of data features. This paper aims to address the
problem from a new perspective and considers modeloriented dimensionality
reduction in parameter spaces of binary multivariate distributions.
Specifically, we propose a general parameter reduction criterion, called
ConfidentInformationFirst (CIF) principle, to maximally preserve confident
parameters and rule out less confident parameters. Formally, the confidence of
each parameter can be assessed by its contribution to the expected Fisher
information distance within the geometric manifold over the neighbourhood of
the underlying real distribution.
We then revisit Boltzmann machines (BM) from a model selection perspective
and theoretically show that both the fully visible BM (VBM) and the BM with
hidden units can be derived from the general binary multivariate distribution
using the CIF principle. This can help us uncover and formalize the essential
parts of the target density that BM aims to capture and the nonessential parts
that BM should discard. Guided by the theoretical analysis, we develop a
samplespecific CIF for model selection of BM that is adaptive to the observed
samples. The method is studied in a series of density estimation experiments
and has been shown effective in terms of the estimate accuracy.

This paper addresses the topic of robust Bayesian compressed sensing over
finite fields. For stationary and ergodic sources, it provides asymptotic (with
the size of the vector to estimate) necessary and sufficient conditions on the
number of required measurements to achieve vanishing reconstruction error, in
presence of sensing and communication noise. In all considered cases, the
necessary and sufficient conditions asymptotically coincide. Conditions on the
sparsity of the sensing matrix are established in presence of communication
noise. Several previously published results are generalized and extended.

Typical dimensionality reduction methods focus on directly reducing the
number of random variables while retaining maximal variations in the data. In
this paper, we consider the dimensionality reduction in parameter spaces of
binary multivariate distributions. We propose a general
ConfidentInformationFirst (CIF) principle to maximally preserve parameters
with confident estimates and rule out unreliable or noisy parameters. Formally,
the confidence of a parameter can be assessed by its Fisher information, which
establishes a connection with the inverse variance of any unbiased estimate for
the parameter via the Cram\'{e}rRao bound. We then revisit Boltzmann machines
(BM) and theoretically show that both singlelayer BM without hidden units
(SBM) and restricted BM (RBM) can be solidly derived using the CIF principle.
This can not only help us uncover and formalize the essential parts of the
target density that SBM and RBM capture, but also suggest that the deep neural
network consisting of several layers of RBM can be seen as the layerwise
application of CIF. Guided by the theoretical analysis, we develop a
samplespecific CIFbased contrastive divergence (CDCIF) algorithm for SBM and
a CIFbased iterative projection procedure (IP) for RBM. Both CDCIF and IP are
studied in a series of density estimation experiments.

We show that thermally evaporated lead (Pb) preserves the electronic
properties of organic monolayers on Si and the surface passivation of the Si
surface itself. The obtained currentvoltage characteristics are in accordance
with results from the wellestablished hanging mercury drop method and preserve
both the moleculeinduced dipolar effect and lengthattenuation of current. We
rationalize our findings by the lack of interaction between the Pb and the Si
substrate. Our method is fast, scalable, compatible to standard semiconductor
processing, and can help to spur the largescale utilization of siliconorganic
hybrid electronics.

The mechanisms of solidstate electron transport (ETp) via a monolayer of
immobilized Azurin (Az) was examined by conducting probe atomic force
microscopy (CPAFM), both as function of temperature (248  373K) and of
applied tip force (612 nN). By varying both temperature and force in CPAFM,
we find that the ETp mechanism can alter with a change in the force applied via
the tip to the proteins. As the applied force increases, ETp via Az changes
from temperatureindependent to thermally activated at high temperatures. This
is in contrast to the Cudepleted form of Az (apoAz), where increasing the
applied force causes only small quantitative effects, that fit with a decrease
in electrode spacing. At low force ETp via holoAz is temperatureindependent
and thermally activated via apoAz. This observation agrees with
macroscopicscale measurements, thus confirming that the difference in ETp
dependence on temperature between holo and apoAz is an inherent one that may
reflect a difference in rigidity between the two forms. An important
implication of these results, which depend on CPAFM measurements over a
significant temperature range, is that for ETp measurements on floppy systems,
such as proteins, the stress applied to the sample should be kept constant or,
at least controlled during measurement.

In density estimation task, maximum entropy model (Maxent) can effectively
use reliable prior information via certain constraints, i.e., linear
constraints without empirical parameters. However, reliable prior information
is often insufficient, and the selection of uncertain constraints becomes
necessary but poses considerable implementation complexity. Improper setting of
uncertain constraints can result in overfitting or underfitting. To solve this
problem, a generalization of Maxent, under Tsallis entropy framework, is
proposed. The proposed method introduces a convex quadratic constraint for the
correction of (expected) Tsallis entropy bias (TEB). Specifically, we
demonstrate that the expected Tsallis entropy of sampling distributions is
smaller than the Tsallis entropy of the underlying real distribution. This
expected entropy reduction is exactly the (expected) TEB, which can be
expressed by a closedform formula and act as a consistent and unbiased
correction. TEB indicates that the entropy of a specific sampling distribution
should be increased accordingly. This entails a quantitative reinterpretation
of the Maxent principle. By compensating TEB and meanwhile forcing the
resulting distribution to be close to the sampling distribution, our
generalized TEBC Maxent can be expected to alleviate the overfitting and
underfitting. We also present a connection between TEB and Lidstone estimator.
As a result, TEBLidstone estimator is developed by analytically identifying
the rate of probability correction in Lidstone. Extensive empirical evaluation
shows promising performance of both TEBC Maxent and TEBLidstone in comparison
with various stateoftheart density estimation methods.