Genome Annotation Markup Elements (GAME)
Motivation
The motivation for GAME is a desire to provide a syntax,
together with some simple tools, that will facilitate the exchange of genomic
annotations. It will enable genome centres, model organism databases, an
individual researchers to clearly specify the conclusions they have drawn from
their analyses of primary sequence data and share these XML descriptions with
one another. The development of GAME was necessary to allow the Drosophila
Genome Project to coordinate their efforts with Celera, which required a stable
and expressive interchange format.
GAME and GFF
GAME complements the existing work that has been done with
GFF (Gene-Finding Features).
GFF is largely targeted at standardizing the output of gene prediction software.
GAME is inclusive of these types of sequence descriptions, but extend beyond
this to include curated results as well. Tools will be added to convert between
these two syntaxes (although it is likely that there is information loss when
going from GAME to GFF).
GAME and EMBL/GenBank
formats
GAME does not aim to be a replacement for the established flat
file formats from GenBank or EMBL, or the ASN.1 model of the GenBank database.
GAME aims to be an interchange format for annotations which can make the
necessary distinctions to allow a full interchange of data between genome
centres. The flat file formats are focused on archival storage of the DNA
sequence as submitted, and the ASN.1 model provides a rich object model for
manipulation of these sequences with the NCBI toolkit. Of course, conversion
tools between the formats for the common information that they share are in
development, but there is not a one to one mapping between a GAME document and
GenBank/EMBL/DDBJ formats.
GAME and the CORBA LSR
standards
Ewan Birney has started a FAQ about GAME. Please read
it if you are unfamiliar with the Project. The CORBA Life Science Research (LSR)
is considering a standard for biosequence analysis. This has considerable
overlap with the GAME document in terms of the information which both standards
define, but as CORBA is focused on an interface and therefore methods
definition and XML is focused on a data definition, the two standards are
largely speaking orthogonal. We believe that GAME is a sensible data orientated
view of a number of interfaces defined in the LSR standard, and hope to define a
sensible mapping between the two standards soon.
Plans
Since this effort has just begun there is a consider
Definitions
- Annotations are quite broadly defined in GAME as a collection of
features found on an associated set of sequences.
- Features in turn are conclusions describing intervals on different
sequences. Features are supported by analytical evidence.
- Analyses are either computational assessments or biological
experiments carried out on a sequence. They are comprised of results that
describe some characteristic of an interval on a sequence.
- Sequences of course are the molecules that are the subject of
study.
These basics entities and their relationships are described
below
