We are sequencing an animal genome and the produced GFF file is version 2. However, I learned that GFF2 is now deprecated and GFF3 is a better choice. So I want a GFF3 file for the genome. Since I am not in charge of GFF production, I want to ask some questions here before submitting my request to the research group.
My questions are (1)How is GFF3 file produced? Is it difficult to get GFF3? Does it take much effort? (2) What softwares are available that support GFF3? GMOD mentions that Apollo, Chado, CMap and GBrowse support GFF3. Softwares based on Chado such as Artemis and ACT also support. Any more?
conversion from another format using an existing software library (e.g. Bioperl's bp_genbank2gff3.pl utility)
writing your own code to parse suitable input data and write out GFF3
Is it difficult to convert GFF2 to GFF3? GMOD describe it as problematic. It should not be too difficult if you use appropriate input data and can write scripts to parse and rewrite text files.
Does it take much effort? Some of the fields are relatively easy to generate from other files (chromosome names, start/end positions); others are a little more difficult - for example, GFF3 should use accepted Sequence Ontology terms. A good starting point for input data is something like the UCSC genome browser MySQL tables.
Typically creating GFF3 is not that hard; several gene prediction programs will create it automatically, and MAKER, an easy to set up gene annotation pipeline (http://gmod.org/wiki/MAKER) will produce GFF3 for all of its outputs. Since you're starting a new genome, that is definitely something I would suggest investigating.
Two more items that might be of use:
The GFF3 specification, with several examples of what proper GFF3 should look like:
In terms of converting GFF2 to GFF3, it is problematic to solve in a general sense: it is hard to make a tool that will take any GFF2 and reliably convert it to GFF3, because of the crazy variability in what people call GFF2 (the specification was very loose). However, for a given GFF2 file, converting it to GFF3 can be fairly easy if you have even a relatively small amount of programming ability. For some common formats, it's even fairly easy to find converters. For example, I know of a converter that works quite well for JGI GFF2.
The answer also depends on just how much and what type of information is contained in your files.
Perhaps edit the question to make it clearer; "How is GFF3 file produced?" is not the same as "how do I make GFF3 from the data that I have."