• ### meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting(1706.06197)

March 11, 2019 cs.AI, cs.CV, cs.CL, cs.LG
We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1-4% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The code is available at https://github.com/lancopku/meProp
• ### Tag-Enhanced Tree-Structured Neural Networks for Implicit Discourse Relation Classification(1803.01165)

March 3, 2018 cs.CL
Identifying implicit discourse relations between text spans is a challenging task because it requires understanding the meaning of the text. To tackle this task, recent studies have tried several deep learning methods but few of them exploited the syntactic information. In this work, we explore the idea of incorporating syntactic parse tree into neural networks. Specifically, we employ the Tree-LSTM model and Tree-GRU model, which are based on the tree structure, to encode the arguments in a relation. Moreover, we further leverage the constituent tags to control the semantic composition process in these tree-structured neural networks. Experimental results show that our method achieves state-of-the-art performance on PDTB corpus.
• ### Interactive Attention Networks for Aspect-Level Sentiment Classification(1709.00893)

Sept. 4, 2017 cs.AI, cs.CL
Aspect-level sentiment classification aims at identifying the sentiment polarity of specific target in its context. Previous approaches have realized the importance of targets in sentiment classification and developed various methods with the goal of precisely modeling their contexts via generating target-specific representations. However, these studies always ignore the separate modeling of targets. In this paper, we argue that both targets and contexts deserve special treatment and need to be learned their own representations via interactive learning. Then, we propose the interactive attention networks (IAN) to interactively learn attentions in the contexts and targets, and generate the representations for targets and contexts separately. With this design, the IAN model can well represent a target and its collocative context, which is helpful to sentiment classification. Experimental results on SemEval 2014 Datasets demonstrate the effectiveness of our model.
• ### Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization(1706.02459)

June 8, 2017 cs.CL
Current Chinese social media text summarization models are based on an encoder-decoder framework. Although its generated summaries are similar to source texts literally, they have low semantic relevance. In this work, our goal is to improve semantic relevance between source texts and summaries for Chinese social media summarization. We introduce a Semantic Relevance Based neural model to encourage high semantic similarity between texts and summaries. In our model, the source text is represented by a gated attention encoder, while the summary representation is produced by a decoder. Besides, the similarity score between the representations is maximized during training. Our experiments show that the proposed model outperforms baseline systems on a social media corpus.
• ### Towards Label Imbalance in Multi-label Classification with Many Labels(1604.01304)

April 5, 2016 cs.LG
In multi-label classification, an instance may be associated with a set of labels simultaneously. Recently, the research on multi-label classification has largely shifted its focus to the other end of the spectrum where the number of labels is assumed to be extremely large. The existing works focus on how to design scalable algorithms that offer fast training procedures and have a small memory footprint. However they ignore and even compound another challenge - the label imbalance problem. To address this drawback, we propose a novel Representation-based Multi-label Learning with Sampling (RMLS) approach. To the best of our knowledge, we are the first to tackle the imbalance problem in multi-label classification with many labels. Our experimentations with real-world datasets demonstrate the effectiveness of the proposed approach.
• ### A Dependency-Based Neural Network for Relation Classification(1507.04646)

July 16, 2015 cs.NE, cs.CL, cs.LG
Previous research on relation classification has verified the effectiveness of using dependency shortest paths or subtrees. In this paper, we further explore how to make full use of the combination of these dependency information. We first propose a new structure, termed augmented dependency path (ADP), which is composed of the shortest dependency path between two entities and the subtrees attached to the shortest path. To exploit the semantic representation behind the ADP structure, we develop dependency-based neural networks (DepNN): a recursive neural network designed to model the subtrees, and a convolutional neural network to capture the most important features on the shortest path. Experiments on the SemEval-2010 dataset show that our proposed method achieves state-of-art results.