• Understanding how genetic changes allow emerging virus strains to escape the protection afforded by vaccination is vital for the maintenance of effective vaccines. In the current work, we use structural and phylogenetic differences between pairs of virus strains to identify important antigenic sites on the surface of the influenza A(H1N1) virus through the prediction of haemagglutination inhibition (HI) assay, pairwise measures of the antigenic similarity of virus strains. We propose a sparse hierarchical Bayesian model that can deal with the pairwise structure and inherent experimental variability in the H1N1 data through the introduction of latent variables. The latent variables represent the underlying HI assay measurement of any given pair of virus strains and help account for the fact that for any HI assay measurement between the same pair of virus strains, the difference in the viral sequence remains the same. Through accurately representing the structure of the H1N1 data, the model is able to select virus sites which are antigenic, while its latent structure achieves the computational efficiency required to deal with large virus sequence data, as typically available for the influenza virus. In addition to the latent variable model, we also propose a new method, block integrated Widely Applicable Information Criterion (biWAIC), for selecting between competing models. We show how this allows us to effectively select the random effects when used with the proposed model and apply both methods to an A(H1N1) dataset.
  • Determining phenotype from genetic data is a fundamental challenge. Influenza A viruses undergo rapid antigenic drift and identification of emerging antigenic variants is critical to the vaccine selection process. Using former seasonal influenza A(H1N1) viruses, hemagglutinin sequence and corresponding antigenic data were analyzed in combination with 3-D structural information. We attributed variation in hemagglutination inhibition to individual amino acid substitutions and quantified their antigenic impact, validating a subset experimentally using reverse genetics. Substitutions identified as low-impact were shown to be a critical component of influenza antigenic evolution and by including these, as well as the high-impact substitutions often focused on, the accuracy of predicting antigenic phenotypes of emerging viruses from genotype was doubled. The ability to quantify the phenotypic impact of specific amino acid substitutions should help refine techniques that predict the fitness and evolutionary success of variant viruses, leading to stronger theoretical foundations for selection of candidate vaccine viruses.