
Introduction to Bioinformatics - COIN81
Phylogenetics
Outcomes - You learn
how to pose questions of phylogenetic relationships, including analysis of
viral mutation and adaptation to a host (HIV) or studying evolution and phylogenetic
relations of viruses, HIV-SIV-HTLV-STLV, Prions, or SARS. We'll do three different
exercises in this assignment:
- HIV mutation study - following
15 HIV seropositive IV drug users at 6 months intervals, looking at viral
diversity and divergence, CD4 counts, and progression of the disease in each
individual. This is a complex multivariant topic, but easy to explore with
Biology Workbench. You might want to familiarize yourself with the BioQUEST
PowerPoint presentation about this HIV mutation
study. To really understand this you will need to read the Markham paper
- Viral envelope evolution study
- we will compare the envelope protein in four retroviruses (HIV, SIV, HTLV,
and STLV), a key feature for cell attachment, to trace the relationships of
retroviruses. We'll apply the same approach to study Prions. A big component
of this exercise is using NCBI whole genome resources to efficiently extract
and compile sequences for each protein in complex viruses.
- Applying the same techniques to
SARS, we'll attempt a comparison of various Coronaviruses to SARS, recreating
the early efforts of the CDC and WHO in trying to figure out where SARS came
from, and approaches to treat SARS based on existing treatments to Coronaviruses.
As above, much of this effort entails finding genomic sequences form NCBI
and compiling documents for protein comparisons
Plan to spend two to four hours on
any of these activities if you are starting from scratch. If you are using pre-built
documents of sequences, you can focus on the phylogenetic relationships using
Biology Workbench, but you'll miss out on the critical step of hunting down
and annotating protein sequences. Try to remember that most of science is collecting
and preparing data for more rigorous analysis, including bioinformatics.
If you have not used Biology Workbench,
you'll need a quick demonstration of it, or plan to spend some quality time
reading a good tutorial.
You'll need to have three or four
sets of data for this assignment. These include:
- HIV mutation study - 15 files
that contain the nucleotide sequence for HIV-1 env (envelope) proteins from
viral clones detected in each individual at six month intervals for the duration
of the study. The zipped archive (or
folder) will have files that can be opened with Notepad, but may have no file
extension, or a .txt file extension. Open them with Notepad to view the contents
of each file, which have a FASTA header.
- HIV envelope comparison - this
is a single text file which contains
eight sequences. Open it in Notepad to view the contents.
- Prion protein comparison - this
a collection or Prion precursor proteins from a dozen organisms. You can add
to this document as an exercise by searching
for Prions in NCBI or Swiss-Prot. This is a great skill if you want to apply
this to other protein comparisons across organisms.
- SARS and coronavirus - Approach
this with care; as you'll need to be really careful in constructing your data,
or start with the corona_sequence_master.txt
document. This file contains sequences from the replicase, spike glycoprotein,
membrane (or matrix) and nucleocapsid proteins, for several coronaviruses
including Avian infectious bronchitis virus (IBV), Bovine coronavirus, (BCoV),
Human coronavirus 229E (HCoV-229E), Murine hepatitis virus (MHV), Porcine
epidemic diarrhea virus (PEDV), SARS coronavirus (HCoV), Transmissible gastroenteritis
virus (TGEV), and more recently, SARS GZ01. There are a half dozen more coronaviruses
that have been compared by CDC and WHO, and this is a very active space. Many
of the proteins are labeled as "putative" as they have not empirically
characterized - instead they are predicted by open reading frames (ORF). This
is a very time consuming activity, and requires a lot of patience and organization.
You can use the master document,
or compilations in the replicase_1A.txt, spike_glycoprotein.txt,
membrane_glycoprotein.txt, or nucleocapsid.txt
files.
- Myoglobin - Hemoglobin comparisons
for mammals, fish, and birds. Try looking at the Myo-Hemo-Proteins.txt
and zipped archive.
Okay - so now you're ready! Set up
your free account at Biology workbench, and follow the directions below. If
you can watch a demo, this will be a lot easier. Write down your user name and
password, and find out if your instructor has set up a 'class demo account'
that you can use as practice.
Log into your Biology Workbench account
and follow the steps below. If it doesn't work, back up, and try again.
- Select "session tools"
- Click on "new session"
- Click on "run"
- Enter the appropriate description
for your session (HIV mutation, HIV evolution, Prion comparisons, SARS phylogeny.
etc.)
- Select nucleic tools (for HIV
mutation, protein tools for HIV env, Prion, or SARS studies)
- Select "add new sequences"
- Click on "run"
- Click on "browse"
- Look for the "HIV data by
subject" folder
- Open one of the subjects (subject
1, subject 15, etc)
- Click on "open"
- Click on "upload"
- Click on "save"
- Repeat this process until you
have uploaded all the data
- Select (using the checkbox) each
of the visits for a given subject that you wish to compare
- Select "ClustalW"
- In ClustalW, make sure to check
the box for "rooted tree" (dendogram)
- Click on "run"
- Click on "submit"
- You'll get a multiple sequence
alignment" and a dendogram
- You can right click on the dendogram
to save as an image, or export as a postscript file (.ps)
Your saved dendograms might look
like:
Good luck! RDC
rdcormia@earthlink.net
Copyright © 2008 - 2009 Robert D. Cormia -
March 31, 2008