Question: Converter for GFF file downloaded from NCBI to GTF
0
gravatar for mmacd
3.3 years ago by
mmacd20
United States
mmacd20 wrote:

Hi all,

I have been looking at different gff3 to gtf converters, but cannot find a good one that works well for gff3 files downloaded from NCBI Refseq assemblies. I am trying to compare (using the program Eval which only takes in gtf files) an existing refseq annotation with one I created using Maker. Maker provides a script to convert its gff3 output to gtf specifically so that Eval can read it in. This worked great on my Maker output, but did not work for the RefSeq gff file.

I have also used Cufflinks gffread tool with no luck either.

Any help is greatly appreciated!

eval gff annotation gtf • 2.8k views
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by mmacd20
1

What's the genome and why do you say the software you found doesn't work well? 

ADD REPLYlink written 3.3 years ago by Parham1.4k

The genome is the Chinese hamster. When I run Maker's converter script, the gtf output file only contains CDS features and files is about 10 fold smaller.

ADD REPLYlink written 3.3 years ago by mmacd20

That is what the GTF is supposed to contain, not all of the alignments and other features (the acronym "GTF" and the specification explains these things).

ADD REPLYlink written 3.3 years ago by SES8.2k

Can you send me the link to the documentation for GTF for where it says its supposed to only contain CDS features? I have read elsewhere  that it describes gene structure and have seen documentation (http://mblab.wustl.edu/GTF22.html) showing GTF's containing exons, utrs etc.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by mmacd20

The answer to your question can be found in the document you linked to for the GTF format. The specification for the feature:

The following feature types are required: "CDS", "start_codon", "stop_codon". The features "5UTR", "3UTR", "inter", "inter_CNS", "intron_CNS" and "exon" are optional. All other features will be ignored.

GTF is basically GFF, but as the name suggests, is more specific to the features it describes and must have the same nomenclature for the gene IDs and transcripts (unlike GFF which is flexible in this respect).

ADD REPLYlink written 3.3 years ago by SES8.2k

If my GFF file had "exon" features, shouldn't the GTF? I am looking for a tool that can do the conversion without losing exon information.

ADD REPLYlink written 3.3 years ago by mmacd20

If multiple tools do not work that usually means there is something wrong with the input. It would help to link to the NCBI page so people could see the file. Also, try to link to the software you mention because it is not always clear what script/tool is being discussed. EDIT: Disregard this, there does not appear to be an issue. OP was not clear what a GTF is supposed to contain.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by SES8.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1035 users visited in the last hour