• Often when multiple labels are obtained for a training example it is assumed that there is an element of noise that must be accounted for. It has been shown that this disagreement can be considered signal instead of noise. In this work we investigate using soft labels for training data to improve generalization in machine learning models. However, using soft labels for training Deep Neural Networks (DNNs) is not practical due to the costs involved in obtaining multiple labels for large data sets. We propose soft label memorization-generalization (SLMG), a fine-tuning approach to using soft labels for training DNNs. We assume that differences in labels provided by human annotators represent ambiguity about the true label instead of noise. Experiments with SLMG demonstrate improved generalization performance on the Natural Language Inference (NLI) task. Our experiments show that by injecting a small percentage of soft label training data (0.03% of training set size) we can improve generalization performance over several baselines.
  • Interpreting the performance of deep learning models beyond test set accuracy is challenging. Characteristics of individual data points are often not considered during evaluation, and each data point is treated equally. We examine the impact of a test set question's difficulty to determine if there is a relationship between difficulty and performance. We model difficulty using well-studied psychometric methods on human response patterns. Experiments on Natural Language Inference (NLI) and Sentiment Analysis (SA) show that the likelihood of answering a question correctly is impacted by the question's difficulty. As DNNs are trained with more data, easy examples are learned more quickly than hard examples.
  • Sentence simplification aims to simplify the content and structure of complex sentences, and thus make them easier to interpret for human readers, and easier to process for downstream NLP applications. Recent advances in neural machine translation have paved the way for novel approaches to the task. In this paper, we adapt an architecture with augmented memory capacities called Neural Semantic Encoders (Munkhdalai and Yu, 2017) for sentence simplification. Our experiments demonstrate the effectiveness of our approach on different simplification datasets, both in terms of automatic evaluation measures and human judgments.
  • With the developments of high resolution optical imaging system, manufacturing of lenses with ultra large aperture becomes increasingly difficult for traditional optical telescope system. By modulating the true thermal light from each point of object into a spatial pseudo-thermal light with a spatial random phase modulator, we propose a lensless Wiener-Khinchin telescope based on high-order spatial autocorrelation of light field. It can acquire the information of object from the second-order spatial intensity autocorrelation of light field in a single-shot measurement. The field of view and resolution of lensless Wiener-Khinchin telescope are quantitatively characterized and analyzed comparing with experimental results. As a new lensless imaging method, lensless Wiener-Khinchin telescope can be applied in many applications such as astronomical observations and X-rays imaging.
  • In this paper, we propose a noise robust bottleneck feature representation which is generated by an adversarial network (AN). The AN includes two cascade connected networks, an encoding network (EN) and a discriminative network (DN). Mel-frequency cepstral coefficients (MFCCs) of clean and noisy speech are used as input to the EN and the output of the EN is used as the noise robust feature. The EN and DN are trained in turn, namely, when training the DN, noise types are selected as the training labels and when training the EN, all labels are set as the same, i.e., the clean speech label, which aims to make the AN features invariant to noise and thus achieve noise robustness. We evaluate the performance of the proposed feature on a Gaussian Mixture Model-Universal Background Model based speaker verification system, and make comparison to MFCC features of speech enhanced by short-time spectral amplitude minimum mean square error (STSA-MMSE) and deep neural network-based speech enhancement (DNN-SE) methods. Experimental results on the RSR2015 database show that the proposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSE and DNN-SE based MFCCs for different noise types and signal-to-noise ratios. Furthermore, the AN-BN feature is able to improve the speaker verification performance under the clean condition.
  • Neural networks have been successfully applied in applications with a large amount of labeled data. However, the task of rapid generalization on new concepts with small training data while preserving performances on previously learned ones still presents a significant challenge to neural network models. In this work, we introduce a novel meta learning method, Meta Networks (MetaNet), that learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization. When evaluated on Omniglot and Mini-ImageNet benchmarks, our MetaNet models achieve a near human-level performance and outperform the baseline approaches by up to 6% accuracy. We demonstrate several appealing properties of MetaNet relating to generalization and continual learning.
  • Background: Electronic health record (EHR) notes contain abundant medical jargon that can be difficult for patients to comprehend. One way to help patients is to reduce information overload and help them focus on medical terms that matter most to them. Objective: The aim of this work was to develop FIT (Finding Important Terms for patients), an unsupervised natural language processing (NLP) system that ranks medical terms in EHR notes based on their importance to patients. Methods: We built FIT on a new unsupervised ensemble ranking model derived from the biased random walk algorithm to combine heterogeneous information resources for ranking candidate terms from each EHR note. Specifically, FIT integrates four single views for term importance: patient use of medical concepts, document-level term salience, word-occurrence based term relatedness, and topic coherence. It also incorporates partial information of term importance as conveyed by terms' unfamiliarity levels and semantic types. We evaluated FIT on 90 expert-annotated EHR notes and compared it with three benchmark unsupervised ensemble ranking methods. Results: FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FIT for identifying important terms from EHR notes was 0.813 AUC-ROC. It outperformed the three ensemble rankers for most metrics. Its performance is relatively insensitive to its parameter. Conclusions: FIT can automatically identify EHR terms important to patients and may help develop personalized interventions to improve quality of care. By using unsupervised learning as well as a robust and flexible framework for information fusion, FIT can be readily applied to other domains and applications.
  • Hypothesis testing is an important cognitive process that supports human reasoning. In this paper, we introduce a computational hypothesis testing approach based on memory augmented neural networks. Our approach involves a hypothesis testing loop that reconsiders and progressively refines a previously formed hypothesis in order to generate new hypotheses to test. We apply the proposed approach to language comprehension task by using Neural Semantic Encoders (NSE). Our NSE models achieve the state-of-the-art results showing an absolute improvement of 1.2% to 2.6% accuracy over previous results obtained by single and ensemble systems on standard machine comprehension benchmarks such as the Children's Book Test (CBT) and Who-Did-What (WDW) news article datasets.
  • Recurrent neural networks (RNNs) process input text sequentially and model the conditional transition between word tokens. In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. However, the current recursive architecture is limited by its dependence on syntactic tree. In this paper, we introduce a robust syntactic parsing-independent tree structured model, Neural Tree Indexers (NTI) that provides a middle ground between the sequential RNNs and the syntactic treebased recursive models. NTI constructs a full n-ary tree by processing the input text with its node function in a bottom-up fashion. Attention mechanism can then be applied to both structure and node function. We implemented and evaluated a binarytree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, and sentence classification, outperforming state-of-the-art recurrent and recursive neural networks.
  • With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank based cepstral feature, deep neural network filter bank cepstral coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The deep neural network filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band-limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof {2015} database show that the Gaussian mixture model maximum-likelihood (GMM-ML) classifier trained by the new feature performs better than the state-of-the-art linear frequency cepstral coefficients (LFCC) based classifier, especially on detecting unknown attacks.
  • We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders. NSE is equipped with a novel memory update rule and has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read}, compose and write operations. NSE can also access multiple and shared memories. In this paper, we demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks: natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved state-of-the-art performance when evaluated on publically available benchmarks. For example, our shared-memory model showed an encouraging result on neural machine translation, improving an attention-based baseline by approximately 1.0 BLEU.
  • Objective: Allowing patients to access their own electronic health record (EHR) notes through online patient portals has the potential to improve patient-centered care. However, medical jargon, which abounds in EHR notes, has been shown to be a barrier for patient EHR comprehension. Existing knowledge bases that link medical jargon to lay terms or definitions play an important role in alleviating this problem but have low coverage of medical jargon in EHRs. We developed a data-driven approach that mines EHRs to identify and rank medical jargon based on its importance to patients, to support the building of EHR-centric lay language resources. Methods: We developed an innovative adapted distant supervision (ADS) model based on support vector machines to rank medical jargon from EHRs. For distant supervision, we utilized the open-access, collaborative consumer health vocabulary, a large, publicly available resource that links lay terms to medical jargon. We explored both knowledge-based features from the Unified Medical Language System and distributed word representations learned from unlabeled large corpora. We evaluated the ADS model using physician-identified important medical terms. Results: Our ADS model significantly surpassed two state-of-the-art automatic term recognition methods, TF*IDF and C-Value, yielding 0.810 ROC-AUC versus 0.710 and 0.667, respectively. Our model identified 10K important medical jargon terms after ranking over 100K candidate terms mined from over 7,500 EHR narratives. Conclusion: Our work is an important step towards enriching lexical resources that link medical jargon to lay terms/definitions to support patient EHR comprehension. The identified medical jargon terms and their rankings are available upon request.
  • Finding related published articles is an important task in any science, but with the explosion of new work in the biomedical domain it has become especially challenging. Most existing methodologies use text similarity metrics to identify whether two articles are related or not. However biomedical knowledge discovery is hypothesis-driven. The most related articles may not be ones with the highest text similarities. In this study, we first develop an innovative crowd-sourcing approach to build an expert-annotated document-ranking corpus. Using this corpus as the gold standard, we then evaluate the approaches of using text similarity to rank the relatedness of articles. Finally, we develop and evaluate a new supervised model to automatically rank related scientific articles. Our results show that authors' ranking differ significantly from rankings by text-similarity-based models. By training a learning-to-rank model on a subset of the annotated corpus, we found the best supervised learning-to-rank model (SVM-Rank) significantly surpassed state-of-the-art baseline systems.
  • Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
  • Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.
  • Sequence labeling for extraction of medical events and their attributes from unstructured text in Electronic Health Record (EHR) notes is a key step towards semantic understanding of EHRs. It has important applications in health informatics including pharmacovigilance and drug surveillance. The state of the art supervised machine learning models in this domain are based on Conditional Random Fields (CRFs) with features calculated from fixed context windows. In this application, we explored various recurrent neural network frameworks and show that they significantly outperformed the CRF models.
  • Biomedical information extraction (BioIE) is important to many applications, including clinical decision support, integrative biology, and pharmacovigilance, and therefore it has been an active research. Unlike existing reviews covering a holistic view on BioIE, this review focuses on mainly recent advances in learning based approaches, by systematically summarizing them into different aspects of methodological development. In addition, we dive into open information extraction and deep learning, two emerging and influential techniques and envision next generation of BioIE.
  • Knowledge gained through X-ray crystallography fostered structural determination of materials and greatly facilitated the development of modern science and technology in the past century. Atomic details of sample structures is achievable by X-ray crystallography, however, it is only applied to crystalline structures. Imaging techniques based on X-ray coherent diffraction or zone plates are capable of resolving the internal structure of non-crystalline materials at nanoscales, but it is still a challenge to achieve atomic resolution. Here we demonstrate a novel lensless Fourier-transform ghost imaging method with pseudo-thermal hard X-rays by measuring the second-order intensity correlation function of the light. We show that high resolution Fourier-transform diffraction pattern of a complex amplitude sample can be achieved at Fresnel region and the amplitude and phase distributions of a sample in spatial domain can be retrieved successfully. The method of lensless X-ray Fourier-transform ghost imaging extends X-ray crystallography to non-crystalline samples, and its spatial resolution is limited only by the wavelength of the X-ray, thus atomic resolution should be routinely obtainable. Since highly coherent X-ray source is not required, comparing to conventional X-ray coherent diffraction imaging, the method can be implemented with laboratory X-ray sources, and it also provides a potential solution for lensless diffraction imaging with fermions, such as neutron and electron where the intensive coherent source usually is not available.
  • Attribute reduction is one of the most important topics in rough set theory. Heuristic attribute reduction algorithms have been presented to solve the attribute reduction problem. It is generally known that fitness functions play a key role in developing heuristic attribute reduction algorithms. The monotonicity of fitness functions can guarantee the validity of heuristic attribute reduction algorithms. In probabilistic rough set model, distribution reducts can ensure the decision rules derived from the reducts are compatible with those derived from the original decision table. However, there are few studies on developing heuristic attribute reduction algorithms for finding distribution reducts. This is partly due to the fact that there are no monotonic fitness functions that are used to design heuristic attribute reduction algorithms in probabilistic rough set model. The main objective of this paper is to develop heuristic attribute reduction algorithms for finding distribution reducts in probabilistic rough set model. For one thing, two monotonic fitness functions are constructed, from which equivalence definitions of distribution reducts can be obtained. For another, two modified monotonic fitness functions are proposed to evaluate the significance of attributes more effectively. On this basis, two heuristic attribute reduction algorithms for finding distribution reducts are developed based on addition-deletion method and deletion method. In particular, the monotonicity of fitness functions guarantees the rationality of the proposed heuristic attribute reduction algorithms. Results of experimental analysis are included to quantify the effectiveness of the proposed fitness functions and distribution reducts.
  • This paper develops and evaluates the performance of an advanced multiple access protocol for transmission of full complement of multimedia signals consisting of various combinations of voice, video, data, text and images over wireless networks. The protocol is called Advanced Multiple Access Protocol for Multimedia Transmission (AMAPMT) and is to be used in the Data Link Layer of the protocol stack. The principle of operation of the protocol is presented in a number of logical flow charts. The protocol grants permission to transmit to a source on the basis of a priority scheme that takes into account a time-to-live (TTL) parameter of all the transactions, selectable priorities assigned to all the sources and relevant channel state information (CSI) in this order. Performance of the protocol is evaluated in terms of quality of service parameters like packet loss ratio (PLR), mean packet transfer delay (MPTD) and throughput. Using a simulation model based on an OPNET simulation software package does the evaluation. Under various traffic loads with constant distributions with various mean arrival rates and transaction sizes results obtained show that the performance is improved when this priority scheme is used than when it is not used. The results for AMAPMT are compared with that of the best currently available multiple access protocol called Adaptive Request Channel Multiple Access (ARCMA). AMAPMT protocol out performs ARCMA protocol.
  • The architecture of biological networks has been reported to exhibit high level of modularity, and to some extent, topological modules of networks overlap with known functional modules. However, how the modular topology of the molecular network affects the evolution of its member proteins remains unclear. In this work, the functional and evolutionary modularity of Homo sapiens (H. sapiens) metabolic network were investigated from a topological point of view. Network decomposition shows that the metabolic network is organized in a highly modular core-periphery way, in which the core modules are tightly linked together and perform basic metabolism functions, whereas the periphery modules only interact with few modules and accomplish relatively independent and specialized functions. Moreover, over half of the modules exhibit co-evolutionary feature and belong to specific evolutionary ages. Peripheral modules tend to evolve more cohesively and faster than core modules do. The correlation between functional, evolutionary and topological modularity suggests that the evolutionary history and functional requirements of metabolic systems have been imprinted in the architecture of metabolic networks. Such systems level analysis could demonstrate how the evolution of genes may be placed in a genome-scale network context, giving a novel perspective on molecular evolution.
  • Complex networks have been applied to model numerous interactive nonlinear systems in the real world. Knowledge about network topology is crucial for understanding the function, performance and evolution of complex systems. In the last few years, many network metrics and models have been proposed to illuminate the network topology, dynamics and evolution. Since these network metrics and models derive from a wide range of studies, a systematic study is required to investigate the correlations between them. The present paper explores the effect of degree correlation on the other network metrics through studying an ensemble of graphs where the degree sequence (set of degrees) is fixed. We show that to some extent, the characteristic path length, clustering coefficient, modular extent and robustness of networks are directly influenced by the degree correlation.
  • The exploration of the structural topology and the organizing principles of genome-based large-scale metabolic networks is essential for studying possible relations between structure and functionality of metabolic networks. Topological analysis of graph models has often been applied to study the structural characteristics of complex metabolic networks.In this work, metabolic networks of 75 organisms were investigated from a topological point of view. Network decomposition of three microbes (Escherichia coli, Aeropyrum pernix and Saccharomyces cerevisiae) shows that almost all of the sub-networks exhibit a highly modularized bow-tie topological pattern similar to that of the global metabolic networks. Moreover, these small bow-ties are hierarchically nested into larger ones and collectively integrated into a large metabolic network, and important features of this modularity are not observed in the random shuffled network. In addition, such a bow-tie pattern appears to be present in certain chemically isolated functional modules and spatially separated modules including carbohydrate metabolism, cytosol and mitochondrion respectively. The highly modularized bow-tie pattern is present at different levels and scales, and in different chemical and spatial modules of metabolic networks, which is likely the result of the evolutionary process rather than a random accident. Identification and analysis of such a pattern is helpful for understanding the design principles and facilitate the modelling of metabolic networks.
  • One of the main tasks of post-genomic informatics is to systematically investigate all molecules and their interactions within a living cell so as to understand how these molecules and the interactions between them relate to the function of the organism, while networks are appropriate abstract description of all kinds of interactions. In the past few years, great achievement has been made in developing theory of complex networks for revealing the organizing principles that govern the formation and evolution of various complex biological, technological and social networks. This paper reviews the accomplishments in constructing genome-based metabolic networks and describes how the theory of complex networks is applied to analyze metabolic networks.
  • The implications of the $f_1(1285)-f_1(1420)$ mixing for the $K_1(^3P_1)-K_1(^1P_1)$ mixing angle is investigated. Based on the $f_1(1285)-f_1(1420)$ mixing angle $\sim 50^\circ$ suggested from the analysis for a substantial body of data concerning the $f_1(1420)$ and $f_1(1285)$, the masses of the $K_1(^3P_1)$ and $K_1(^1P_1)$ are determined to be $\sim 1307.35\pm 0.63$ MeV and $1370.03\pm 9.69$ MeV, respectively, which therefore suggests that the $K_1(^3P_1)-K_1(^1P_1)$ mixing angle is about $\pm (59.55\pm 2.81)^\circ$. Also, it is found that the mass of the $h^\prime_1(^1P_1)$ (mostly of $s\bar{s}$) state is about $1495.18\pm 8.82$ MeV. Comparison of the predicted results and the available experimental information of the $h_1(1380)$ shows that without further confirmation on the $h_1(1380)$, the assignment of the $h_1(1380)$ as the $s\bar{s}$ member of the $^1P_1$ meson nonet may be premature.