Introduction to Bioinformatics - COIN81


Assignment Five - Genomic Tools and Databases

In the last assignment you'll explore specialized genome and pathway resources, visiting the ten sites shown below.

NCBI - Genome Resources at NCBI. Search for reference sequences for plant, animal, bacteria, and viruses

dbSNP - Database of SNP Variation. This is cross referenced to LocusLink.

TIGR - The Institute for Genome Research.

ENSEMBL - ENSEMBL Genome Browser

UCSC - Genome Bioinformatics

GeneCards - Database of human genes, their products and their involvement in diseases

BRENDA - The Comprehensive Enzyme Information System

ExPASy Biochemical Pathways - A resource for finding enzyme entries and pathway information

KEGG - GenomeNet at Kyoto University and Kyoto Encyclopedia of Genes and Genomes -

E-Cell System - an object-oriented software suite for modeling, simulation, and analysis of large scale complex systems such as biological cells

Assignment deliverables - At each of the ten sites above you'll do a few directed searches, but try to follow that up with searches around your protein. Your deliverable will be a word document (or Web page) showing what you did, what you saw, and what you learned.

NCBI Genome resources - From the main NCBI page, (or the Genome page) select 'Genomes' in the pull down menu, and search for SARS. What is the entry that you get? Follow that link to the SARS genome. On the left hand column, follow the links to protein view and coding region view. From the coding region view, you can see PID entries for each (putative) protein. Save that data in FASTA protein and FASTA nucleotide views. This skill will come in handy when you are comparing similar proteins across different organisms, as you did in assignment four with HIV.

dbSNP - The SNP variation database is a central point of research for activities including haplotyping projects. When you search SNPs directly for a gene name, like COMT, you get a zillion RS entries (these are SNP accessions). Instead, use LocusLink to search COMT, and follow the variation link. Do this for at least three genes. Look on the right hand side of the page for the legend to understand SNP entries. If you have time, research the haplotyping microarray projects currently being conducted. These 'whole genome' analyses might become common in the future.

TIGR - TIGR is a genome centric portal with links to content and resources from unpublished genomes to microarray resources. TIGR is involved with many projects central to human genomes, and genomes important in agriculture, including rice. Take some time to explore TIGR genome resources. There is a section for microarray resources. If you are interested in bioinformatics, microarray technology is critical in discovery.

ENSEMBL Genome browser - Ensembl is a joint project between EMBL - EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on eukaryotic genomes. The Genome Browser is a powerful tool for searching genes, proteins, peptides, sequences, diseases, markers, mRNA, SNPs, UniGene, and many other resources. It is similar to NCBI Entrez, focusing on EMBL and EBI data. Using your target gene or protein, search each of the files from the front page. Use the browser view to navigate the human genome (chromosomes). Compare the results you get with NCBI (human) . How would you compare and contrast the two?

UCSC - Genome Bioinformatics contains the reference sequence for the human and C. elegans genomes and working drafts for the mouse, rat, C. briggsae, and SARS genomes. Try the browser gateway to human, mouse, rat, C. elegans, C. briggsae, and most recently, SARS genome resources. Display the navigation map to the SARS genome, and scroll down the page to see how other coronaviruses compare to SARS. You should also try BLAT . Use the COMT_FASTA nucleotide sequence and BLAT the human genome. What did you get? Try a BLAT against the mouse database with a human gene. Also, try a BLAT with a protein sequence. Try your gene (or protein), or a small sequence of your gene. How is BLAT different from BLAST? Read about the design and operation of BLAT for fast sequence alignment.

GeneCards - is a database of human genes, their products and their involvement in diseases. It's a source of information that can help you rapidly find information about a gene, including variation and expression data. Try a text search, or the different variation of searches below.

See what GeneCards has to offer for researching your target of interest. What information did you find here that you didn't find elsewhere?

BRENDA - The Comprehensive Enzyme Information System, requires that you have an account set up. If you are working from *inside* the FHDA network, you have full access without logging in. If not, you need to set up a student / faculty account. This can be tricky, so don't lose time here if you're not set up to go with this. Try finding the link to 2.1.1.6 : Catechol O-methyltransferase. BRENDA contains an immense amount of cross-referenced information, including links to reactions and pathways. What did you learn about your protein (if it's an enzyme) here?

ExPASy Biochemical Pathways - A resource for finding enzyme entries and pathway information, is a quick way to get enzyme information that's linked into the ExPASy protein database. Try doing a search with COMT. What two results did you get? Follow the link into NiceZyme view. What links are available from their to other bioinformatic sites? Try a search from that page into NCBI PubMed. Report interesting findings. Finally, follow the link to the pathway map for COMT (or your target). Things start to get pretty busy from here on out.

KEGG - GenomeNet at Kyoto University and Kyoto Encyclopedia of Genes and Genomes. KEGG is probably the best resource for pathway information, which is extensively linked into a comprehensive database of information. Navigating KEGG can be trying, so please follow a few of these links first. Then try your gene, protein, enzyme etc., and find a pathway. Using COMT (and EC 2.1.1.6) you should get this link for the enzyme entry http://www.genome.jp/dbget-bin/www_bget?ec:2.1.1.6 (study this entry carefully as it has other useful links within KEGG) and this link for the pathway: http://www.genome.jp/dbget-bin/show_pathway?hsa00350+2.1.1.6 Take some time to explore the pathway map. You should be able to navigate back and forth between the enzyme entry and the pathway map. This site also supports XML (KGML).

E-Cell System - an object-oriented software suite for modeling, simulation, and analysis of large scale complex systems such as biological cells. Check out this site, and if you are interested in XML and systems biology, try CellML and Systems Biology Markup Language - SBML.

Read Single Nucleotide Polymorphisms and Linkage Disequilibrium Mapping for more information on SNPs which has a section on the BRCA1 and BRCA2 mutation, and other good SNP explainations.

Click here to return to the course outline.