Question: Is Is Feasible To Produce Intron Gff According To Utr Gff And Cds Gff?
3
gravatar for Dejian
8.0 years ago by
Dejian1.3k
United States
Dejian1.3k wrote:

We have two separate GFF files. One is UTR GFF file, describing the 5' and 3' UTR regions. The other is CDS GFF describing coding sequences. Is there a tool to generate the intron GFF according to these two files? Or, how is the intron GFF file usually produced?

gff intron • 6.5k views
ADD COMMENTlink written 8.0 years ago by Dejian1.3k
9
gravatar for Haibao Tang
8.0 years ago by
Haibao Tang3.0k
Mountain View, CA
Haibao Tang3.0k wrote:

If GFF3, use GenomeTools.

Usage: gt gff3 [option ...] [GFF3_file ...]
Parse, possibly transform, and output GFF3 files.

-addintrons add intron features between existing exon features
            default: no
ADD COMMENTlink written 8.0 years ago by Haibao Tang3.0k
2

Be careful with this approach though. If exon features are not explicitly defined, then gt will not create any intron features. I've made that mistake a few times.

ADD REPLYlink written 8.0 years ago by Daniel Standage3.9k

Pretty encouraging. I will try it. Thanks!

ADD REPLYlink written 8.0 years ago by Dejian1.3k
0
gravatar for Daniel Standage
8.0 years ago by
Daniel Standage3.9k
Davis, California, USA
Daniel Standage3.9k wrote:

Let me answer your two questions in reverse order.

Or, how is the intron GFF file usually produced?

In my experience, different feature types aren't typically stored in separate files--in other words, you don't have a CDS file, a UTR file, an intron file, etc, you simply have a single file with all the features in it. That doesn't mean your approach is incorrect, it just isn't typical and doesn't provide any immediate benefit (unless of course you are running scripts that have been built to expect it).

Is there a tool to generate the intron GFF according to these two files?

Perhaps, but it shouldn't be too difficult to do with minimal scripting experience. Once you have determined the exon coordinates using the CDS and UTR data, then simply create an intron feature to fill in the space between each adjacent pair of exon features.

Haibao mentioned the very useful GenomeTools utility, but to use that you would still first have to determine the exon coordinates and provide them as input. If you can calculate the exon coordinates from a set of CDS and UTR coordinates, then surely you can calculate intron coordinates from a set of exon coordinates.

ADD COMMENTlink written 8.0 years ago by Daniel Standage3.9k

Thank you, Daniel. I thought this was a routine task and there were possibly some scripts dealing with this issue. It seems that I have to work it out myself. :)

ADD REPLYlink written 8.0 years ago by Dejian1.3k

You would think that something such basic tasks would be easy to do with a variety of existing tools. The problem is that people use the GFF3 (and GFF3-like) formats so differently as to make it quite difficult to handle every possible case. For example, some people include exons and CDSs but leave out UTRs, some people include CDSs and UTRs but leave out exons, etc. In each case, it is possible to infer the missing features, but it gets complicated when you have to account for every possible case.

ADD REPLYlink written 7.9 years ago by Daniel Standage3.9k
0
gravatar for Abhi
7.9 years ago by
Abhi1.5k
United States
Abhi1.5k wrote:

I have a similar question. If I have a GFF file with CDS and UTR features how can I find out the exon start and end ?

Thanks! -Abhi

ADD COMMENTlink written 7.9 years ago by Abhi1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1347 users visited in the last hour