Question: How To Produce Gff3 Files?
6
gravatar for Dejian
7.4 years ago by
Dejian1.2k
United States
Dejian1.2k wrote:

Dear all,

We are sequencing an animal genome and the produced GFF file is version 2. However, I learned that GFF2 is now deprecated and GFF3 is a better choice. So I want a GFF3 file for the genome. Since I am not in charge of GFF production, I want to ask some questions here before submitting my request to the research group.

My questions are (1)How is GFF3 file produced? Is it difficult to get GFF3? Does it take much effort? (2) What softwares are available that support GFF3? GMOD mentions that Apollo, Chado, CMap and GBrowse support GFF3. Softwares based on Chado such as Artemis and ACT also support. Any more?

Many thanks!

gff conversion • 14k views
ADD COMMENTlink modified 7.4 years ago by Scott Cain750 • written 7.4 years ago by Dejian1.2k
2

The answer also depends on just how much and what type of information is contained in your files.

ADD REPLYlink written 7.4 years ago by Istvan Albert ♦♦ 77k

Perhaps edit the question to make it clearer; "How is GFF3 file produced?" is not the same as "how do I make GFF3 from the data that I have."

ADD REPLYlink written 7.4 years ago by Neilfws48k
6
gravatar for Neilfws
7.4 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

GFF3 files are generated either by:

  • conversion from another format using an existing software library (e.g. Bioperl's bp_genbank2gff3.pl utility)
  • writing your own code to parse suitable input data and write out GFF3

Is it difficult to convert GFF2 to GFF3? GMOD describe it as problematic. It should not be too difficult if you use appropriate input data and can write scripts to parse and rewrite text files.

Does it take much effort? Some of the fields are relatively easy to generate from other files (chromosome names, start/end positions); others are a little more difficult - for example, GFF3 should use accepted Sequence Ontology terms. A good starting point for input data is something like the UCSC genome browser MySQL tables.

You have listed most of the commonly-used software packages. Some others include the Integrated Genomics Viewer, SAMMate and Tablet.

ADD COMMENTlink written 7.4 years ago by Neilfws48k
1

First method would work fine if you had a file in Genbank format - doesn't have to be in the GenBank database :-)

ADD REPLYlink written 7.4 years ago by Neilfws48k

Since the genome is newly sequenced, the first method won't work. Probably the most urgent work is to annotate the genomic sequences through querying different databases and then parse the annotation files to extract suitable fields to build the GFF3 file. Refer to http://biostar.stackexchange.com/questions/2494/gff3-fasta-to-genbank-augustus-training-set .

ADD REPLYlink written 7.4 years ago by Dejian1.2k

That's great! But GenBank Flat File Format seems not easy to produce locally. Any tools to facilitate producing GenBank format files?

ADD REPLYlink written 7.4 years ago by Dejian1.2k

That's great! But GenBank Flat File Format1 seems not easy to produce locally. Any tools to facilitate producing GenBank format files?

ADD REPLYlink written 7.4 years ago by Dejian1.2k

Well, I'm not suggesting that you generate genbank first, just that genbank->GFF is one way to make GFF3. Since you state that you already have GFF2, I'd suggest that is the sensible starting point.

ADD REPLYlink written 7.4 years ago by Neilfws48k

Thank you. I'll try to convert my GFF2 to GFF3. Your reply is pretty helpful.

ADD REPLYlink written 7.4 years ago by Dejian1.2k
5
gravatar for Scott Cain
7.4 years ago by
Scott Cain750
Scott Cain750 wrote:

Typically creating GFF3 is not that hard; several gene prediction programs will create it automatically, and MAKER, an easy to set up gene annotation pipeline (http://gmod.org/wiki/MAKER) will produce GFF3 for all of its outputs. Since you're starting a new genome, that is definitely something I would suggest investigating.

Two more items that might be of use:

  1. The GFF3 specification, with several examples of what proper GFF3 should look like:

http://www.sequenceontology.org/gff3.shtml

  1. A GFF3 validator:

http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online

In terms of converting GFF2 to GFF3, it is problematic to solve in a general sense: it is hard to make a tool that will take any GFF2 and reliably convert it to GFF3, because of the crazy variability in what people call GFF2 (the specification was very loose). However, for a given GFF2 file, converting it to GFF3 can be fairly easy if you have even a relatively small amount of programming ability. For some common formats, it's even fairly easy to find converters. For example, I know of a converter that works quite well for JGI GFF2.

ADD COMMENTlink written 7.4 years ago by Scott Cain750

MAKER is really a good pipeline. But I'm not sure whether it can meet our needs since it claims that MAKER is ideal for smaller projects[1] while our genome is really large - around 7G. But I'd like to see into it. I've downloaded the package.

[1] http://genome.cshlp.org/content/18/1/188.full#sec-13

ADD REPLYlink written 7.4 years ago by Dejian1.2k

I think the authors of MAKER were making the point there that it will work nicely for small projects as well. For larger projects, MAKER is still good, you'll probably just want to have a cluster to do the analysis rather than running it on a laptop.

ADD REPLYlink written 7.4 years ago by Scott Cain750
1
gravatar for Jorge Amigo
7.4 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

you may retrieve the information you need (just the how to, not a software tool to do it) from GMOD's wiki too, at the end of the GFF2 format description.

ADD COMMENTlink written 7.4 years ago by Jorge Amigo11k

Yes, I saw it. But "Converting a file from GFF2 to GFF3 format is problematic" and "GMOD does not endorse (or disparage) any particular converter." I just want to get GFF3 version, not nessarily converted from GFF2. Possibly the title is a bit missleading. I will change it.Thank you all the same.

ADD REPLYlink written 7.4 years ago by Dejian1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1390 users visited in the last hour