Question: What Are, If Any, The Conventions For Encoding Extra Information In The Gff3 Ninth Column?
3
gravatar for Michael Barton
8.5 years ago by
Michael Barton1.8k
Akron, Ohio, United States
Michael Barton1.8k wrote:

The GFF3 format is well specified for describing sequence location and type in the first eight columns. The ninth column is left for specifying any remaining information. I would like to use GFF3 the encode the data typically produced by a genome annotator.

How should I go about this? Are there any conventions for encoding information such as protein product, EC number, and description?

annotation gff • 1.7k views
ADD COMMENTlink written 8.5 years ago by Michael Barton1.8k
3
gravatar for Scott Cain
8.5 years ago by
Scott Cain750
Scott Cain750 wrote:

There are no conventions outside of the GFF3 spec, though I would note that the reserved tag "Note" has historically been used for descriptions in GBrowse, so if you're going to be using GBrowse, it makes sense to put descriptions there. Generally I suggest that you encode the information in such a way that it will make it easier to use in whatever your downstream application for it is. Of course, you may not always know what that is, but that's my best advice.

ADD COMMENTlink written 8.5 years ago by Scott Cain750

Thanks for suggestion. Are there any common variants of GFF which include this type of information?

ADD REPLYlink written 8.5 years ago by Michael Barton1.8k

You mean that have protein information? Not that I can recall, though I can tell you that the GFF3 at NCBI has EC_number tags that violate the GFF3 spec (at least, they did the last time I checked). What is the end use that you have in mind?

ADD REPLYlink written 8.5 years ago by Scott Cain750

Encoding additional genome annotation data in GFF3. Things like product and description.

ADD REPLYlink written 8.5 years ago by Michael Barton1.8k

Yes, but I meant, why do you want to do that? Who is it for, and what will they be using the GFF file for?

ADD REPLYlink written 8.5 years ago by Scott Cain750
1
gravatar for Neilfws
8.5 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

You'll see from the spec that there are a few conventions for column 9: key-value pairs, "reserved" keys, characters that require escapes and some conventions regarding case.

Apart from that, the key-value pairs are whatever you wish. This is in large part because GFF3 files are used by GBrowse. Track display in GBrowse often relies on small Perl subroutines in the config file, which read customised attributes from column 9.

You might want to look at the Bioperl script bp_genbank2gff3.pl, which goes some way towards "standardising" the conversion of GenBank format to GFF3.

ADD COMMENTlink written 8.5 years ago by Neilfws48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1555 users visited in the last hour