• ### Non-reciprocal Light-harvesting Nanoantennae Made by Nature(1702.06671)

Most of our current understanding of mechanisms of photosynthesis comes from spectroscopy. However, classical definition of radio-antenna can be extended to optical regime to discuss the function of light-harvesting antennae. Further to our previously proposed model of a loop antenna we provide several more physical explanations on considering the non-reciprocal properties of the light harvesters of bacteria. We explained the function of the non-heme iron at the reaction center, and presented reasons for each module of the light harvester being composed of one carotenoid, two short $\alpha$-helical polypeptides and three bacteriochlorophylls; we explained also the toroidal shape of the light harvester, the upper bound of the characteristic length of the light harvester, the functional role played by the long-lasting spectrometric signal observed, and the photon anti-bunching observed. Based on these analyses, two mechanisms might be used by radiation-durable bacteria, {\it Deinococcus radiodurans}; and the non-reciprocity of an archaeon, {\it Haloquadratum walsbyi}, are analyzed. The physical lessons involved are useful for designing artificial light harvesters, optical sensors, wireless power chargers, passive super-Planckian heat radiators, photocatalytic hydrogen generators, and radiation protective cloaks. In particular it can predict what kind of particles should be used to separate sunlight into a photovoltaically and thermally useful range to enhance the efficiency of solar cells.
• ### Memetic Algorithms for Ligand Expulsion from Protein Cavities(1507.00150)

March 20, 2019 physics.chem-ph, q-bio.BM
Ligand diffusion through proteins is a fundamental process governing biological signaling and enzymatic catalysis. The complex topology of protein tunnels results in difficulties with computing ligand escape pathways by standard molecular dynamics (MD) simulations. Here, two novel methods for searching of ligand exit pathways and cavity exploration are proposed: memory random acceleration MD (mRAMD), and memetic algorithms (MA). In mRAMD, finding exit pathways is based on a non-Markovian biasing that is introduced to optimize the unbinding force. In MA, hybrid learning protocols are exploited to predict optimal ligand exit paths. The methods are tested on three proteins with increasing complexity of tunnels: M2 muscarinic receptor, nitrile hydratase, and cytochrome P450cam. In these cases, the proposed methods outperform standard techniques that are used currently to find ligand egress pathways. The proposed approach is general and appropriate for accelerated transport of an object through a network of protein tunnels.
• ### Prediction of Structures and Interactions from Genome Information(1709.08021)

Oct. 13, 2018 q-bio.BM
Predicting three dimensional residue-residue contacts from evolutionary information in protein sequences was attempted already in the early 1990s. However, contact prediction accuracies of methods evaluated in CASP experiments before CASP11 remained quite low, typically with $<20$% true positives. Recently, contact prediction has been significantly improved to the level that an accurate three dimensional model of a large protein can be generated on the basis of predicted contacts. This improvement was attained by disentangling direct from indirect correlations in amino acid covariations or cosubstitutions between sites in protein evolution. Here, we review statistical methods for extracting causative correlations and various approaches to describe protein structure, complex, and flexibility based on predicted contacts.
• ### Proton tunneling in hydrogen bonds and its implications in an induced-fit model of enzyme catalysis(1703.00789)

The role of proton tunneling in biological catalysis is investigated here within the frameworks of quantum information theory and thermodynamics. We consider the quantum correlations generated through two hydrogen bonds between a substrate and a prototypical enzyme that first catalyzes the tautomerization of the substrate to move on to a subsequent catalysis, and discuss how the enzyme can derive its catalytic potency from these correlations. In particular, we show that classical changes induced in the binding site of the enzyme spreads the quantum correlations among all of the four hydrogen-bonded atoms thanks to the directionality of hydrogen bonds. If the enzyme rapidly returns to its initial state after the binding stage, the substrate ends in a new transition state corresponding to a quantum superposition. Open quantum system dynamics can then naturally drive the reaction in the forward direction from the major tautomeric form to the minor tautomeric form without needing any additional catalytic activity. We find that in this scenario the enzyme lowers the activation energy so much that there is no energy barrier left in the tautomerization, even if the quantum correlations quickly decay.
• ### Nonequilibrium Energetics of Molecular Motor Kinesin(1704.05302)

July 10, 2018 physics.bio-ph, q-bio.BM
Nonequilibrium energetics of single molecule translational motor kinesin was investigated by measuring heat dissipation from the violation of the fluctuation-response relation of a probe attached to the motor using optical tweezers. The sum of the dissipation and work did not amount to the input free energy change, indicating large hidden dissipation exists. Possible sources of the hidden dissipation were explored by analyzing the Langevin dynamics of the probe, which incorporates the two-state Markov stepper as a kinesin model. We conclude that internal dissipation is dominant.
• ### REinforcement learning based Adaptive samPling: REAPing Rewards by Exploring Protein Conformational Landscapes(1710.00495)

One of the key limitations of Molecular Dynamics simulations is the computational intractability of sampling protein conformational landscapes associated with either large system size or long timescales. To overcome this bottleneck, we present the REinforcement learning based Adaptive samPling (REAP) algorithm that aims to efficiently sample conformational space by learning the relative importance of each reaction coordinate as it samples the landscape. To achieve this, the algorithm uses concepts from the field of reinforcement learning, a subset of machine learning, which rewards sampling along important degrees of freedom and disregards others that do not facilitate exploration or exploitation. We demonstrate the effectiveness of REAP by comparing the sampling to long continuous MD simulations and least-counts adaptive sampling on two model landscapes (L-shaped and circular), and realistic systems such as alanine dipeptide and Src kinase. In all four systems, the REAP algorithm consistently demonstrates its ability to explore conformational space faster than the other two methods when comparing the expected values of the landscape discovered for a given amount of time. The key advantage of REAP is on-the-fly estimation of the importance of collective variables, which makes it particularly useful for systems with limited structural information.
• ### Eigenvector Centrality Distribution for Characterization of Protein Allosteric Pathways(1706.02327)

June 25, 2018 cond-mat.soft, q-bio.BM
Determining the principal energy pathways for allosteric communication in biomolecules, that occur as a result of thermal motion, remains challenging due to the intrinsic complexity of the systems involved. Graph theory provides an approach for making sense of such complexity, where allosteric proteins can be represented as networks of amino acids. In this work, we establish the eigenvector centrality metric in terms of the mutual information, as a mean of elucidating the allosteric mechanism that regulates the enzymatic activity of proteins. Moreover, we propose a strategy to characterize the range of the physical interactions that underlie the allosteric process. In particular, the well known enzyme, imidazol glycerol phosphate synthase (IGPS), is utilized to test the proposed methodology. The eigenvector centrality measurement successfully describes the allosteric pathways of IGPS, and allows to pinpoint key amino acids in terms of their relevance in the momentum transfer process. The resulting insight can be utilized for refining the control of IGPS activity, widening the scope for its engineering. Furthermore, we propose a new centrality metric quantifying the relevance of the surroundings of each residue. In addition, the proposed technique is validated against experimental solution NMR measurements yielding fully consistent results. Overall, the methodologies proposed in the present work constitute a powerful and cost effective strategy to gain insight on the allosteric mechanism of proteins.
• ### All-atom simulations reveal how single point mutations promote serpin misfolding(1707.05019)

Protein misfolding is implicated in many diseases, including the serpinopathies. For the canonical inhibitory serpin {\alpha}1-antitrypsin (A1AT), mutations can result in protein deficiencies leading to lung disease, and misfolded mutants can accumulate in hepatocytes leading to liver disease. Using all-atom simulations based on the recently developed Bias Functional algorithm we elucidate how wild-type A1AT folds and how the disease-associated S (Glu264Val) and Z (Glu342Lys) mutations lead to misfolding. The deleterious Z mutation disrupts folding at an early stage, while the relatively benign S mutant shows late stage minor misfolding. A number of suppressor mutations ameliorate the effects of the Z mutation and simulations on these mutants help to elucidate the relative roles of steric clashes and electrostatic interactions in Z misfolding. These results demonstrate a striking correlation between atomistic events and disease severity and shine light on the mechanisms driving chains away from their correct folding routes.
• ### Relevance of the speed and direction of pulling in simple modular proteins(1707.03882)

A theoretical analysis of the unfolding pathway of simple modular proteins in length- controlled pulling experiments is put forward. Within this framework, we predict the first module to unfold in a chain of identical units, emphasizing the ranges of pulling speeds in which we expect our theory to hold. These theoretical predictions are checked by means of steered molecular dynamics of a simple construct, specifically a chain composed of two coiled-coils motives, where anisotropic features are revealed. These simulations also allow us to give an estimate for the range of pulling velocities in which our theoretical approach is valid.
• ### Mapping energy transport networks in proteins(1805.03715)

The response of proteins to chemical reactions or impulsive excitation that occurs within the molecule has fascinated chemists for decades. In recent years ultrafast X-ray studies have provided ever more detailed information about the evolution of protein structural change following ligand photolysis, and time-resolved IR and Raman techniques, e.g., have provided detailed pictures of the nature and rate of energy transport in peptides and proteins, including recent advances in identifying transport through individual amino acids of several heme proteins. Computational tools to locate energy transport pathways in proteins have also been advancing. Energy transport pathways in proteins have since some time been identified by molecular dynamics (MD) simulations, and more recent efforts have focused on the development of coarse graining approaches, some of which have exploited analogies to thermal transport in other molecular materials. With the identification of pathways in proteins and protein complexes, network analysis has been applied to locate residues that control protein dynamics and possibly allostery, where chemical reactions at one binding site mediate reactions at distance sites of the protein. In this chapter we review approaches for locating computationally energy transport networks in proteins. We present background into energy and thermal transport in condensed phase and macromolecules that underlies the approaches we discuss before turning to a description of the approaches themselves. We also illustrate the application of the computational methods for locating energy transport networks and simulating energy dynamics in proteins with several examples.
• ### Transfer-matrix calculations of the effects of tension and torque constraints on DNA-protein interactions(1802.01437)

May 9, 2018 q-bio.BM
Organization and maintenance of the chromosomal DNA in living cells strongly depends on the DNA interactions with a plethora of DNA-binding proteins. Single-molecule studies show that formation of nucleoprotein complexes on DNA by such proteins is frequently subject to force and torque constraints applied to the DNA. Although the existing experimental techniques allow to exert these type of mechanical constraints on individual DNA biopolymers, their exact effects in regulation of DNA-protein interactions are still not completely understood due to the lack of systematic theoretical methods able to efficiently interpret complex experimental observations. To fill this gap, we have developed a general theoretical framework based on the transfer-matrix calculations that can be used to accurately describe behaviour of DNA-protein interactions under force and torque constraints. Potential applications of the constructed theoretical approach are demonstrated by predicting how these constraints affect the DNA-binding properties of different types of architectural proteins. Obtained results provide important insights into potential physiological functions of mechanical forces in the chromosomal DNA organization by architectural proteins as well as into single-DNA manipulation studies of DNA-protein interactions.
• ### Coarse-Grained Simulation of DNA using LAMMPS(1802.07145)

During the last decade coarse-grained nucleotide models have emerged that allow us to DNA and RNA on unprecedented time and length scales. Among them is oxDNA, a coarse-grained, sequence-specific model that captures the hybridisation transition of DNA and many structural properties of single- and double-stranded DNA. oxDNA was previously only available as standalone software, but has now been implemented into the popular LAMMPS molecular dynamics code. This article describes the new implementation and analyses its parallel performance. Practical applications are presented that focus on single-stranded DNA, an area of research which has been so far under-investigated. The LAMMPS implementation of oxDNA lowers the entry barrier for using the oxDNA model significantly, facilitates future code development and interfacing with existing LAMMPS functionality as well as other coarse-grained and atomistic DNA models.
• ### Genome packaging within icosahedral capsids and large-scale segmentation in viral genomic sequences(1803.09489)

May 6, 2018 q-bio.QM, q-bio.BM
The assembly and maturation of viruses with icosahedral capsids must be coordinated with icosahedral symmetry. The icosahedral symmetry imposes also the restrictions on the cooperative specific interactions between genomic RNA/DNA and coat proteins that should be reflected in quasi-regular segmentation of viral genomic sequences. Combining discrete direct and double Fourier transforms, we studied the quasi-regular large-scale segmentation in genomic sequences of different ssRNA, ssDNA, and dsDNA viruses. The particular representatives included satellite tobacco mosaic virus and the strains of satellite tobacco necrosis virus, STNV-C, STNV-1, STNV-2, Escherichia phages MS2, phiX174, alpha3, and HK97, and Simian virus 40. In all their genomes, we found the significant quasi-regular segmentation of genomic sequences related to the virion assembly and the genome packaging within icosahedral capsid. We also found good correspondence between our results and available cryo-electron microscopy data on capsid structures and genome packaging in these viruses. Fourier analysis of genomic sequences provides the additional insight into mechanisms of hierarchical genome packaging and may be used for verification of the concepts of 3-fold or 5-fold intermediates in virion assembly. The results of sequence analysis should be taken into account at the choice of models and data interpretation. They also may be helpful for the development of antiviral drugs.
• ### Protein Folding Optimization using Differential Evolution Extended with Local Search and Component Reinitialization(1710.07031)

May 6, 2018 cs.AI, cs.NE, q-bio.BM
This paper presents a novel Differential Evolution algorithm for protein folding optimization that is applied to a three-dimensional AB off-lattice model. The proposed algorithm includes two new mechanisms. A local search is used to improve convergence speed and to reduce the runtime complexity of the energy calculation. For this purpose, a local movement is introduced within the local search. The designed evolutionary algorithm has fast convergence speed and, therefore, when it is trapped into the local optimum or a relatively good solution is located, it is hard to locate a better similar solution. The similar solution is different from the good solution in only a few components. A component reinitialization method is designed to mitigate this problem. Both the new mechanisms and the proposed algorithm were analyzed on well-known amino acid sequences that are used frequently in the literature. Experimental results show that the employed new mechanisms improve the efficiency of our algorithm and that the proposed algorithm is superior to other state-of-the-art algorithms. It obtained a hit ratio of 100% for sequences up to 18 monomers, within a budget of $10^{11}$ solution evaluations. New best-known solutions were obtained for most of the sequences. The existence of the symmetric best-known solutions is also demonstrated in the paper.
• ### Optimizing Native Ion Mobility Q-TOF in Helium and Nitrogen for Very Fragile Noncovalent Interactions(1805.01735)

May 4, 2018 q-bio.QM, q-bio.BM
The meaningful comparison of ion mobility (IM) results and of collision cross section (CCS) values on different platforms is a prerequisite for using CCS for identification or structural assignment. The amount of internal energy imparted to the ions prior to the ion mobility cell is a source of experimental variation. Here we investigated the effects of virtually all tuning parameters of the Agilent 6560 IM-Q-TOF on the arrival time distributions of Ubiquitin7+, and found conditions in which the native state prevails. We will discuss the effects of solvent evaporation conditions in the source, in the entire pre-IM DC voltage gradient, and with the funnel RF amplitudes, and will also report on ubiquitin7+ conformations in different solvents, including native supercharging conditions. Collision-induced unfolding (CIU) can be conveniently provoked in two distinct regions: behind the source capillary (by changing the fragmentor voltage) and in the trapping funnel (by changing the trap entrance grid delta voltage). The softness of the instrumental conditions were then optimized with the benchmark DNA G-quadruplex [(dG4T4G4)2.(NH4+)3-8H]5-, for which ion activation results in ammonia loss. To reduce the ion internal energy and obtain the intact 3-NH4+ complex, we reduced the post-IM voltage gradient, but this resulted in a lower IM resolving power due to increased diffusion behind the drift tube. The article thus describes the various trade-offs between ion activation, ion transmission, and ion mobility performance for native MS of very fragile structures.
• ### Environmentally controlled curvature of single collagen proteins(1803.03392)

May 4, 2018 physics.bio-ph, q-bio.BM
The predominant structural protein in vertebrates is collagen, which plays a key role in extracellular matrix and connective tissue mechanics. Despite its prevalence and physical importance in biology, the mechanical properties of molecular collagen are far from established. The flexibility of its triple helix is unresolved, with descriptions from different experimental techniques ranging from flexible to semirigid. Furthermore, it is unknown how collagen type (homo- vs. heterotrimeric) and source (tissue-derived vs. recombinant) influence flexibility. Using SmarTrace, a chain tracing algorithm we devised, we performed statistical analysis of collagen conformations collected with atomic force microscopy (AFM) to determine the protein's mechanical properties. Our results show that types I, II and III collagens - the key fibrillar varieties - exhibit molecular flexibilities that are very similar. However, collagen conformations are strongly modulated by salt, transitioning from compact to extended as KCl concentration increases, in both neutral and acidic pH. While analysis with a standard worm-like chain model suggests that the persistence length of collagen can attain almost any value within the literature range, closer inspection reveals that this modulation of collagen's conformational behavior is not due to changes in flexibility, but rather arises from the induction of curvature (either intrinsic or induced by interactions with the mica surface). By modifying standard polymer theory to include innate curvature, we show that collagen behaves as an equilibrated curved worm-like chain (cWLC) in two dimensions. Analysis within the cWLC model shows that collagen's curvature depends strongly on pH and salt, while its persistence length does not. Thus, we find that triple-helical collagen is well described as semiflexible, irrespective of source, type, pH and salt environment.
• ### Green function of correlated genes in a minimal mechanical model of protein evolution(1801.03681)

The function of proteins arises from cooperative interactions and rearrangements of their amino acids, which exhibit large-scale dynamical modes. Long-range correlations have also been revealed in protein sequences, and this has motivated the search for physical links between the observed genetic and dynamic cooperativity. We outline here a simplified theory of protein, which relates sequence correlations to physical interactions and to the emergence of mechanical function. Our protein is modeled as a strongly-coupled amino acid network whose interactions and motions are captured by the mechanical propagator, the Green function. The propagator describes how the gene determines the connectivity of the amino acids, and thereby the transmission of forces. Mutations introduce localized perturbations to the propagator which scatter the force field. The emergence of function is manifested by a topological transition when a band of such perturbations divides the protein into subdomains. We find that epistasis -- the interaction among mutations in the gene -- is related to the nonlinearity of the Green function, which can be interpreted as a sum over multiple scattering paths. We apply this mechanical framework to simulations of protein evolution, and observe long-range epistasis which facilitates collective functional modes.
• ### netgwas: An R Package for Network-Based Genome-Wide Association Studies(1710.01236)

Graphical models are powerful tools for modeling and making statistical inferences regarding complex associations among variables in multivariate data. In this paper we introduce the R package netgwas, which is designed based on undirected graphical models to accomplish three important and interrelated goals in genetics: constructing linkage map, reconstructing linkage disequilibrium (LD) networks from multi-loci genotype data, and detecting high-dimensional genotype-phenotype networks. The netgwas package deals with species with any chromosome copy number in a unified way, unlike other software. It implements recent improvements in both linkage map construction (Behrouzi and Wit, 2018), and reconstructing conditional independence network for non-Gaussian continuous data, discrete data, and mixed discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely occur in genetics and genomics such as genotype data, and genotype-phenotype data. We demonstrate the value of our package functionality by applying it to various multivariate example datasets taken from the literature. We show, in particular, that our package allows a more realistic analysis of data, as it adjusts for the effect of all other variables while performing pairwise associations. This feature controls for spurious associations between variables that can arise from classical multiple testing approach. This paper includes a brief overview of the statistical methods which have been implemented in the package. The main body of the paper explains how to use the package. The package uses a parallelization strategy on multi-core processors to speed-up computations for large datasets. In addition, it contains several functions for simulation and visualization. The netgwas package is freely available at https://cran.r-project.org/web/packages/netgwas
• ### Visualizing mitochondrial FoF1-ATP synthase as the target of the immunomodulatory drug Bz-423(1804.11081)

April 30, 2018 q-bio.SC, q-bio.BM
Targeting the mitochondrial enzyme FoF1-ATP synthase and modulating its catalytic activities with small molecules is a promising new approach for treatment of autoimmune diseases. The immuno-modulatory compound Bz-423 is such a drug that binds to subunit OSCP of the mitochondrial FoF1-ATP synthase and induces apoptosis via increased reactive oxygen production in coupled, actively respiring mitochondria. Here we review the experimental progress to reveal the binding of Bz-423 to the mitochondrial target and discuss how subunit rotation of FoF1-ATP synthase is affected by Bz-423. Briefly, we report how F\"orster resonance energy transfer (FRET) can be employed to colocalize the enzyme and the fluorescently tagged Bz-423 within the mitochondria of living cells with nanometer resolution.
• ### Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges(1804.10647)

April 27, 2018 q-bio.BM
Advanced mathematics, such as multiscale weighted colored graph and element specific persistent homology, and machine learning including deep neural networks were integrated to construct mathematical deep learning models for pose and binding affinity prediction and ranking in the last two D3R grand challenges in computer-aided drug design and discovery. D3R Grand Challenge 2 (GC2) focused on the pose prediction and binding affinity ranking and free energy prediction for Farnesoid X receptor ligands. Our models obtained the top place in absolute free energy prediction for free energy Set 1 in Stage 2. The latest competition, D3R Grand Challenge 3 (GC3), is considered as the most difficult challenge so far. It has 5 subchallenges involving Cathepsin S and five other kinase targets, namely VEGFR2, JAK2, p38-$\alpha$, TIE2, and ABL1. There is a total of 26 official competitive tasks for GC3. Our predictions were ranked 1st in 10 out of 26 official competitive tasks.
• ### An implementation of the maximum-caliber principle by replica-averaged time-resolved restrained simulations(1802.06560)

April 24, 2018 cond-mat.stat-mech, q-bio.BM
Inferential methods can be used to integrate experimental informations and molecular simulations. The maximum entropy principle provides a framework for using equilibrium experimental data and it has been shown that replica-averaged simulations, restrained using a static potential, are a practical and powerful implementation of such principle. Here we show that replica-averaged simulations restrained using a time-dependent potential are equivalent to the principle of maximum caliber, the dynamic version of the principle of maximum entropy, and thus may allow to integrate time-resolved data in molecular dynamics simulations. We provide an analytical proof of the equivalence as well as a computational validation making use of simple models and synthetic data. Some limitations and possible solutions are also discussed.
• ### Binding Pathway of Opiates to $\mu$ Opioid Receptors Revealed by Unsupervised Machine Learning(1804.08206)

April 23, 2018 q-bio.QM, q-bio.BM
Many important analgesics relieve pain by binding to the $\mu$-Opioid Receptor ($\mu$OR), which makes the $\mu$OR among the most clinically relevant proteins of the G Protein Coupled Receptor (GPCR) family. Despite previous studies on the activation pathways of the GPCRs, the mechanism of opiate binding and the selectivity of $\mu$OR are largely unknown. We performed extensive molecular dynamics (MD) simulation and analysis to find the selective allosteric binding sites of the $\mu$OR and the path opiates take to bind to the orthosteric site. In this study, we predicted that the allosteric site is responsible for the attraction and selection of opiates. Using Markov state models and machine learning, we traced the pathway of opiates in binding to the orthosteric site, the main binding pocket. Our results have important implications in designing novel analgesics.
• ### Smartphone-based point-of-care lipid blood test performance evaluation compared with a clinical diagnostic laboratory method(1804.07387)

April 19, 2018 q-bio.QM, q-bio.BM
Managing blood lipid levels is important for the treatment and prevention of diabetes, cardiovascular disease, and obesity. An easy-to-use, portable lipid blood test will accelerate more frequent testing by patients and at-risk populations. We used smartphone systems that are already familiar to many people. Because smartphone systems can be carried around everywhere, blood can be measured easily and frequently. We compared the results of lipid tests with those of existing clinical diagnostic laboratory methods. We found that smartphone-based point-of-care lipid blood tests are as accurate as hospital-grade laboratory tests. Our system will be useful for those who need to manage blood lipid levels to motivate them to track and control their behavior.
• ### Deep transfer learning in the assessment of the quality of protein models(1804.06281)

April 17, 2018 q-bio.BM
MOTIVATION: Proteins fold into complex structures that are crucial for their biological functions. Experimental determination of protein structures is costly and therefore limited to a small fraction of all known proteins. Hence, different computational structure prediction methods are necessary for the modelling of the vast majority of all proteins. In most structure prediction pipelines, the last step is to select the best available model and to estimate its accuracy. This model quality estimation problem has been growing in importance during the last decade, and progress is believed to be important for large scale modelling of proteins. The current generation of model quality estimation programs performs well at separating incorrect and good models, but fails to consistently identify the best possible model. State-of-the-art model quality assessment methods use a combination of features that describe a model and the agreement of the model with features predicted from the protein sequence. RESULTS: We first introduce a deep neural network architecture to predict model quality using significantly fewer input features than state-of-the-art methods. Thereafter, we propose a methodology to train the deep network that leverages the comparative structure of the problem. We also show the possibility of applying transfer learning on databases of known protein structures. We demonstrate its viability by reaching state-of-the-art performance using only a reduced set of input features and a coarse description of the models. AVAILABILITY: The code will be freely available for download at github.com/ElofssonLab/ProQ4.
• ### Classifying Antimicrobial and Multifunctional Peptides with Bayesian Network Models(1804.06327)

April 17, 2018 stat.AP, q-bio.BM, stat.ML
Bayesian network models are finding success in characterizing enzyme-catalyzed reactions, slow conformational changes, predicting enzyme inhibition, and genomics. In this work, we apply them to statistical modeling of peptides by simultaneously identifying amino acid sequence motifs and using a motif-based model to clarify the role motifs may play in antimicrobial activity. We construct models of increasing sophistication, demonstrating how chemical knowledge of a peptide system may be embedded without requiring new derivation of model fitting equations after changing model structure. These models are used to construct classifiers with good performance (94% accuracy, Matthews correlation coefficient of 0.87) at predicting antimicrobial activity in peptides, while at the same time being built of interpretable parameters. We demonstrate use of these models to identify peptides that are potentially both antimicrobial and antifouling, and show that the background distribution of amino acids could play a greater role in activity than sequence motifs do. This provides an advancement in the type of peptide activity modeling that can be done and the ease in which models can be constructed.