Introduction to Bioinformatics - COIN81


Influenza Exercise

Introduction: The purpose of this exercise is to introduce students to phylogenetics through a story centered curriculum. The exercise described herein is an actual activity being conducted right now, following the emergence of influenza A from from birds (Avian influenza or bird flu) into pigs, where it is adapting to a mammalian host, from which it will be transmissible to humans. The Spanish flu pandemic of 1918-1919 killed as many as 50 million people in 6 months, and many at the World Health Organization (WHO) and Centers for Disease Control (CDC) feel the world is not prepared for what may emerge as a global influenza pandemic. Pandemics arise every 25 to 30 years, with major outbreaks in 1957, 1968, and lesser forms in 1975.

Wild migratory birds hold all known strains of flu, which they have adapted to. In 1994, a series of mutations in an avian influenza, H5N1, lead to a lethal strain which killed millions of birds in Viet Nam in 1995, and later in China. Domestic birds, especially chickens, were infected, and a campaign of culling birds was undertaken in an attempt to eradicate the flu. The flu was not eradicated, and later appeared in other migratory birds in China, later infecting domestic ducks. Concern over the spread of influenza from birds to pigs was later confirmed as having had occurred in Indonesia, and possibly Viet Nam and China. Infected pigs at first did appear to get sick, but now may have adapted to the avian influenza.

In preparing for this exercise, you will need to spend a little time reading about Influenza, and the progression of the various strains from birds to pigs, which contain receptors for both avian and human flu strains in their trachea. You will need to understand the concept of reassortment, where RNA from one Influenza stain can move and 'combine' with RNA from another flu strain. This is what allows influenza to rapidly mutate, and in the presence of multiple receptor types in the trachea of pigs, for avian influenza to adapt, and later emerge, as a human transmissible influenza.

Scenario: You are working for the World Health Organization (WHO) in Summer 2005. You have access to avian and human flu sequences from 1918, 1934, 1957, 1968, and 1997, and a basic phylogenetics tool for performing multiple sequence alignments. Your job is to analyze the current, real-time influenza (1997 AI) sequence data from Asia, compare it to both phylogenetic and epidemiological data from 1918, and realistically assess the current and future threat of an influenza pandemic in humans. If needed, you many specify what additional data you need to predict when and where bird flu might enter the human population, and how best to contain a possible pandemic. You have only 12 weeks to complete your initial assignment and make both a presentation and formal recommendations to the United Nations and World Health Organization.

Learning objectives and tasks:

  1. Learn epidemiology and biology of influenza
  2. Search and review 1918 influenza bioinformatics literature
  3. Learn why the influenza pandemic of 1918 so deadly?
  4. Perform Multiple Sequence Alignment of HIV and SARS
  5. Perform phylogenetic analysis of HIV mutation and SARS
  6. From literature, determine what flu sequences to compare
  7. Get 1918 and other flu sequences from NCBI (and other sources)
  8. Format flu sequences into a single organizing text document
  9. Upload flu sequence document to Biology Workbench and run a Multiple Sequence Alignment (MSA)
  10. Make intuitive judgments about outbreaks of flu and antigenic drift and shift
  11. Predict from all the above tasks if Avian Influenza (AI) will be a pandemic
  12. Prepare a plan to determine what animals should be followed for avian flu infection

Epidemiology and biology of influenza

Whether you are using influenza as the central purpose of the exercise, or phylogenetics, you will need to understand the basic epidemiology (nature of the epidemic) of influenza. The most important thing to understand is how influenza moves from birds to pigs to people, and the role that pigs (swine) play as a 'mixing vessel' for influenza. Because pigs have receptors in their trachea for binding both avian influenza and human flu, RNA from avian flu can be transfer to mammalian bound flu in a process called 'reassortment'. Reassortment allows the pathogenic avian flu to 'adapt' to a mammalian host (pathogenically and transmissibility) which leads to the emergence of a human flu with antigens (from bird flu) that humans have no natural immunity to. This is the basic condition that leads to a highly pathogenic pandemic, and it is actually happening right now!

Look at the influenza resources at NCBI http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html You can explore the flu genomes from there, focusing on influenza A virus, and data from 1918, 1934, 1957, and 1968. More importantly, take time to understand the Hemagglutinin and Neuraminidase genes, which are used to describe each virus as HXNY (e.g. H5N1). You should also look at the Influenza Sequence Database at http://flu.lanl.gov/ which allows you to search and compare various flu strains and their sequences http://flu.lanl.gov/search/, and do alignments without having to download and format sequence files. I recommend that you search this database and do alignments *after* you have done manual alignments at Biology Workbench. Only then will you appreciate the power of what you are using at these custom Web portals (databases and bioinformatics tools). The ability to use these Web portals using a story centered curriculum is important for both faculty and students learning bioinformatics.

Search and review 1918 influenza bioinformatics literature

Start your investigation of influenza with Google, and pay particular attention to articles that might contain the words 1918, Spanish flu, pandemic, and then focus your search on phylogenetics, sequence alignment, and pathogenecity. You will eventually look towards comparisons of H1N1 from 1918 with H5N1 in 1997, but for now just exploring these article is the main effort. You'll eventually be looking though NCBI PubMed, with links to their free text articles, and then focusing in on PNAS. The latter articles are usually free, and I have downloaded about 20 total articles for this work.

Why was pandemic of 1918 so deadly?

Now you will focus in on understanding why the flu of 1918 was highly pathogenic. There will be some articles on DNA microarray work to look at gene expression in lung tissue that are infected with H1N1. Many young and previously healthy adults succumbed quickly as their lungs filled with fluid as a result of cytokines released by the infected cells. The pathogenic nature of H5N1, occurring in 1995, was based on insertion of two basic residues (Arg and Lys), and a mutation of Glu to Lys in the Hemagglutinin gene. Can you find that article? (Hint - you'll need to look at the CDC website, and at the section on flu) http://www.cdc.gov/ncidod/eid/vol4no3/webster.htm Plan on spending a few hours on this single activity alone. Also read Enhanced virulence of influenza A viruses with the haemagglutinin of the 1918 pandemic virus at this link.

Perform MSA and phylogenetics of HIV and retrovirus comparison

Before diving into the deep end of the pool, you'll need to warm up on the phylogenetics exercises that are a warm up to this work. Make sure to work through the problems at Phylogenetics.htm as these are basic skills you'll need to master.

Perform MSA and phylogenetics of SARS / coronavirus comparison

As you work through the exercise above, pay attention to the work required to do the SARS comparison, and especially on the comparison of the viral proteins from SARS with the other coronaviruses (etc.). This exercise is an actual recreation of what the CDC did in May 2003. Finding, downloading, and formatting the protein sequences for that comparison can take as much as four hours, but is an extremely important skill.

Flu data

You will need to download 32 NA genomic and 32 protein sequences from NCBI. You'll need to format them in a text document, and in a format that can be read into biology workbench. Look *carefully* at the files I have prepared (HIV mutation, Retrovirus, SARS, and the flu data.) Over 90% of the work in this exercise is downloading sequences and putting them into a proper text format.

Finding flu sequences

There are a few papers that discuss alignments of the HA and NA gene. One in particular is Characterization of the 1918 ‘‘Spanish’’ influenza virus neuraminidase gene by Ann H. Reid,* Thomas G. Fanning, Thomas A. Janczewski, and Jeffery K. Taubenberger. Find that article by Googling the title but searching only against PDF file formats. Plan to read this paper, scanning it first, but at some point very carefully. Refer to Table 1 which lists the accession numbers of file you will need to hunt down. I have put these files in an Excel file to help you keep track of them. Note that there are 32 nucleotide and 32 protein sequences that you'll need to download, which can take several hours. Try downloading at least three of each type, saving each in FASTA format only, with a .txt file extension (XYZ_nucleotide_FASTA.txt and XYZ_protein_FASTA.txt) If you get tired of all that, I have done this for nucleotide, protein, and properly formatted protein, which contains the master file for phylogenetics.

Formatting flu sequences

As mentioned above, finding, downloading, and formatting sequences can take hours of tedious work. See the Neuraminidase_master.txt file. In this file, the order of entries is upside down (inverted) alphanumeric, which allows Biology Workbench to display them as A-Z (1-x) alphanumeric. You may also use the H1-H15_AlignAA_fasta.txt file

Upload flu sequences to Biology Workbench

After you have prepared your files for uploading to Biology Workbench, you'll need to make sure that you have an account open there. To set up an account (instant and painless) click this link. Once you have finished that, enter the site from this link. To learn to use Biology Workbench, take some time to read this tutorial. From here you will open a session called flu (or influenza) and upload the text file from above.

Perform MSA and phylogenetics on avian influenza

After it's up, check all the sequences Now you're ready to perform the multiple sequences alignment, called ClustalW. In building a dendrogram (phylogenetic tree) select rooted tree only. Press submit, cross your fingers, and in few seconds you'll have the phylogenetics of the flu sequences. Take some time to compare this to the article by Reid et. al, and you'll notice some embellishment on the part of graphic artists.

NS1 Motif analysis

In this exercise we will examine the NS-1 gene, using full coding sequence data sets, to determine the identity of the last four amino acids. Research has indicated that this PL motif can be used to make a determination as to the host / source of an influenza strain. Large-Scale Sequence Analysis of Avian Influenza Isolates http://www.sciencemag.org/ SCIENCE VOL 311 17 MARCH 2006. In this research, analysis of the NS-1 AA coding sequence showed the following motifs associated with avian, human, and swine influenzas.

Can H5N1 infect humans through a different subtype?

As early as 2000, geneticists and epidemiologists determined that the Influenza A subtype H9N2 could infect humans. Sequence analysis showed that although HA and NA genes were distinct from circulating H5N1, the six internal genes were genetically similar to H5N1, indeed were virtually identical to H5N1. The concern here is that an human infection by a virus with HA and NA genes from H9N2, with H5N1 internal genes, allows the H5N1 virus to move into the human host. This may be very close to what had happened prior to 1918 with the four avian genes that entered H1N1. Please read Avian-to-human transmission of H9N2 subtype influenza A viruses: Relationship between H9N2 and H5N1 human isolates. Consider how the phylogenetic alignments, and sequence and motif analysis you have performed are used in the work described in this paper.

Recombinomics and subtypes

After you've read all of the material above, we'll take a look at a very novel, and very controversial, theory called recombinomics. Briefly, it is a theory (and observation) that rapid viral evolution occurs through the 'intelligent sharing and recombination of alleles' that are present in a distributed network of viral strains. The distributed network of viral strains for influenza A are the 25 HA/NA serotypes that often can be co-infected in animals, such as Avian Influenza, Swine Influenza, and Human Influenza. This theory attempts to identify alleles, such as (list all Influenza alleles here) PB2 E627K, that has been observed in some of the human outbreaks of Avian influenza. Recombinomics is different than reassortment, in that recombinomics of alleles occurs because of RNA's ability to become mobile inside a large viral colony, while reassortment is the movement of an entire gene. Recombining explains that pandemics that occur every 30 years or so, where HA and NA take on a new character that humans have no immune protection from. Reassortment occurs when an entire gene moves, as is hypothesized for Spanish Influenza. It might be possible that a combination of reassortment and recombinomics explains both the high infection rate and lethality of influenzas, such as in 1918.

Pandemic predictions and prevention

So what does this all mean? I strongly recommend that you read articles, Focus on the Flu, and Avian flu special: Is this our best shot? and the special May segment in Nature on Avian Influenza. Think about the emergence of a pandemic, does it need to move from bird to pig to human? What is our ability to follow the rates of mutation using sequence alignment, phylogenetics, and structure prediction tools? How critical is it to monitor and develop a database for all circulating avian (and swine and human) influenza sequences throughout the world, and use advanced data mining tools to follow the molecular evolution of the virus?

What would you do?

So if you were in charge of the CDC, or WHO, or worked at the UN. What would you do to monitor and prepare for a potential Influenza pandemic? What scientific tools would you use to monitor, what medical resources would you need, and from a societal standpoint, what would be your reaction to an outbreak of Avian Influenza which was human transmissible?


This lesson is copyrighted using an Educational Common License, and may be used freely without restriction for academic purposes.


Copyright © 2009 - 2010 Robert D. Cormia - March 31, 2008