Introduction to Bioinformatics - COIN81


COIN 81 Week Three

We will spend an intense week performing BLAST on unknown sequences, following the best alignment 'hits' into GenBank, and saving the genomic and proteomic information in various text file formats. There's a lot of practice, and some exploration of different BLAST techniques. You'll gain both experience and confidence in assignment two. You'll find that practice will also lessen the period of time required for you to understand how to navigate NCBI's databases.

Our lesson with start with BLAST, and you will practice your searches from this link: BLAST.htm. Make sure to spend at least three to four hours 'BLAST'ing' teh sequences, and following your hits into GenBank. From there you'll save a GenBank file as a flatfile (text format) and then save the sequence in FASTA format. Spend time in NCBI reading about FASTA as well.

Study the ten links below, as this will help you understand what GenBank is, and how it is a central repository for genomic, proteomic, structure, SNP, and 'tagged' sites (ESTs and STS).

BLAST - Basic Local Alignment Search Technique

GenBank - Genbank Overview

Entrez - Now that you have explored NCBI a bit, explore GenBank using the entrez browsing tool.

GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research 2002 Jan 1;30(1):17-20). There are approximately 22,617,000,000 bases in 18,197,000 sequence records as of August 2002 (see GenBank growth statistics). As an example, you may view the record for a Saccharomyces cerevisiae gene. The complete release notes for the current versMarch 22, 2009 two months. GenBank is part of the International Nucleotide Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis.

Sequin - A client-side submission tool for uploading sequences into GenBank and NCBI.

Entrez Nucleotide - Genomic side of GenBank

Structure - Proteomic side of GenBank conatining structural data

EST - Espressed Sequence tags database

STS - Sequence Tagged sites database

Searching GenBank - a free text approach to searching GenBank