Is Is Feasible To Produce Intron Gff According To Utr Gff And Cds Gff?
3
3
Entering edit mode
11.1 years ago
Dejian ★ 1.3k

We have two separate GFF files. One is UTR GFF file, describing the 5' and 3' UTR regions. The other is CDS GFF describing coding sequences. Is there a tool to generate the intron GFF according to these two files? Or, how is the intron GFF file usually produced?

gff intron • 8.0k views
9
Entering edit mode
11.1 years ago

If GFF3, use GenomeTools.

Usage: gt gff3 [option ...] [GFF3_file ...]
Parse, possibly transform, and output GFF3 files.

default: no

2
Entering edit mode

Be careful with this approach though. If exon features are not explicitly defined, then gt will not create any intron features. I've made that mistake a few times.

0
Entering edit mode

Pretty encouraging. I will try it. Thanks!

0
Entering edit mode
11.1 years ago

Or, how is the intron GFF file usually produced?

In my experience, different feature types aren't typically stored in separate files--in other words, you don't have a CDS file, a UTR file, an intron file, etc, you simply have a single file with all the features in it. That doesn't mean your approach is incorrect, it just isn't typical and doesn't provide any immediate benefit (unless of course you are running scripts that have been built to expect it).

Is there a tool to generate the intron GFF according to these two files?

Perhaps, but it shouldn't be too difficult to do with minimal scripting experience. Once you have determined the exon coordinates using the CDS and UTR data, then simply create an intron feature to fill in the space between each adjacent pair of exon features.

Haibao mentioned the very useful GenomeTools utility, but to use that you would still first have to determine the exon coordinates and provide them as input. If you can calculate the exon coordinates from a set of CDS and UTR coordinates, then surely you can calculate intron coordinates from a set of exon coordinates.

0
Entering edit mode

Thank you, Daniel. I thought this was a routine task and there were possibly some scripts dealing with this issue. It seems that I have to work it out myself. :)

0
Entering edit mode

You would think that something such basic tasks would be easy to do with a variety of existing tools. The problem is that people use the GFF3 (and GFF3-like) formats so differently as to make it quite difficult to handle every possible case. For example, some people include exons and CDSs but leave out UTRs, some people include CDSs and UTRs but leave out exons, etc. In each case, it is possible to infer the missing features, but it gets complicated when you have to account for every possible case.

0
Entering edit mode
11.0 years ago
Abhi ★ 1.6k

I have a similar question. If I have a GFF file with CDS and UTR features how can I find out the exon start and end ?

Thanks! -Abhi