How To Produce Gff3 Files?
3
8
Entering edit mode
13.0 years ago
Dejian ★ 1.3k

Dear all,

We are sequencing an animal genome and the produced GFF file is version 2. However, I learned that GFF2 is now deprecated and GFF3 is a better choice. So I want a GFF3 file for the genome. Since I am not in charge of GFF production, I want to ask some questions here before submitting my request to the research group.

My questions are (1)How is GFF3 file produced? Is it difficult to get GFF3? Does it take much effort? (2) What softwares are available that support GFF3? GMOD mentions that Apollo, Chado, CMap and GBrowse support GFF3. Softwares based on Chado such as Artemis and ACT also support. Any more?

Many thanks!

gff conversion gff • 25k views
ADD COMMENT
2
Entering edit mode

The answer also depends on just how much and what type of information is contained in your files.

ADD REPLY
0
Entering edit mode

Perhaps edit the question to make it clearer; "How is GFF3 file produced?" is not the same as "how do I make GFF3 from the data that I have."

ADD REPLY
8
Entering edit mode
13.0 years ago
Neilfws 49k

GFF3 files are generated either by:

  • conversion from another format using an existing software library (e.g. Bioperl's bp_genbank2gff3.pl utility)
  • writing your own code to parse suitable input data and write out GFF3

Is it difficult to convert GFF2 to GFF3? GMOD describe it as problematic. It should not be too difficult if you use appropriate input data and can write scripts to parse and rewrite text files.

Does it take much effort? Some of the fields are relatively easy to generate from other files (chromosome names, start/end positions); others are a little more difficult - for example, GFF3 should use accepted Sequence Ontology terms. A good starting point for input data is something like the UCSC genome browser MySQL tables.

You have listed most of the commonly-used software packages. Some others include the Integrated Genomics Viewer, SAMMate and Tablet.

ADD COMMENT
1
Entering edit mode

First method would work fine if you had a file in Genbank format - doesn't have to be in the GenBank database :-)

ADD REPLY
0
Entering edit mode

Since the genome is newly sequenced, the first method won't work. Probably the most urgent work is to annotate the genomic sequences through querying different databases and then parse the annotation files to extract suitable fields to build the GFF3 file. Refer to this post.

ADD REPLY
0
Entering edit mode

That's great! But GenBank Flat File Format seems not easy to produce locally. Any tools to facilitate producing GenBank format files?

ADD REPLY
0
Entering edit mode

That's great! But GenBank Flat File Format1 seems not easy to produce locally. Any tools to facilitate producing GenBank format files?

ADD REPLY
0
Entering edit mode

Well, I'm not suggesting that you generate genbank first, just that genbank->GFF is one way to make GFF3. Since you state that you already have GFF2, I'd suggest that is the sensible starting point.

ADD REPLY
0
Entering edit mode

Thank you. I'll try to convert my GFF2 to GFF3. Your reply is pretty helpful.

ADD REPLY
7
Entering edit mode
13.0 years ago
Scott Cain ▴ 770

Typically creating GFF3 is not that hard; several gene prediction programs will create it automatically, and MAKER, an easy to set up gene annotation pipeline (http://gmod.org/wiki/MAKER) will produce GFF3 for all of its outputs. Since you're starting a new genome, that is definitely something I would suggest investigating.

Two more items that might be of use:

  1. The GFF3 specification, with several examples of what proper GFF3 should look like:

http://www.sequenceontology.org/gff3.shtml

  1. A GFF3 validator:

http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online

In terms of converting GFF2 to GFF3, it is problematic to solve in a general sense: it is hard to make a tool that will take any GFF2 and reliably convert it to GFF3, because of the crazy variability in what people call GFF2 (the specification was very loose). However, for a given GFF2 file, converting it to GFF3 can be fairly easy if you have even a relatively small amount of programming ability. For some common formats, it's even fairly easy to find converters. For example, I know of a converter that works quite well for JGI GFF2.

ADD COMMENT
0
Entering edit mode

MAKER is really a good pipeline. But I'm not sure whether it can meet our needs since it claims that MAKER is ideal for smaller projects[1] while our genome is really large - around 7G. But I'd like to see into it. I've downloaded the package.

[1] http://genome.cshlp.org/content/18/1/188.full#sec-13

ADD REPLY
0
Entering edit mode

I think the authors of MAKER were making the point there that it will work nicely for small projects as well. For larger projects, MAKER is still good, you'll probably just want to have a cluster to do the analysis rather than running it on a laptop.

ADD REPLY
1
Entering edit mode
13.0 years ago

you may retrieve the information you need (just the how to, not a software tool to do it) from GMOD's wiki too, at the end of the GFF2 format description.

ADD COMMENT
0
Entering edit mode

Yes, I saw it. But "Converting a file from GFF2 to GFF3 format is problematic" and "GMOD does not endorse (or disparage) any particular converter." I just want to get GFF3 version, not nessarily converted from GFF2. Possibly the title is a bit missleading. I will change it.Thank you all the same.

ADD REPLY

Login before adding your answer.

Traffic: 3230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6