How to represent trans-spliced genes in GTF?
1
2
Entering edit mode
6.8 years ago
Dan ▴ 530

For example, see this gene (nad1) in ENA: http://www.ebi.ac.uk/ena/data/view/ABI60879

If you look at the XML for that gene you see the following:

join(
             DQ984518.1: 324706 .. 325091 ,
  complement(DQ984518.1:  24417 ..  24498),
  complement(DQ984518.1:  22828 ..  23019),
             DQ984518.1:   3484 ..   3542 ,
  complement(DQ984518.1: 153702 .. 153960)
)

Which shows 5 exons joined out of phase and out of order. Is there a valid GTF representation of this?

How to dump a 'non-canonically spliced' gene into GTF? i.e. what's the recommendation?

trans-splicing GTF • 1.6k views
ADD COMMENT
0
Entering edit mode

Also, how to verify that the resulting GTF is valid? Compare the translation?

ADD REPLY
0
Entering edit mode

Hello Dan!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/849/how-to-represent-trans-spliced-genes-in-gtf/855#855

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
1
Entering edit mode
6.8 years ago

GTF is a very simplistic and ill-defined format - it is a variant of GFF 2 that only has two required fields

gene_id "ABC"; transcript_id "EFG";

beyond that, there is no requirement. So I don't think it could be turned into a "standard" GTF form since that, in turn, does not exist.

Perhaps as Devon Ryan points out you may use GFF3 with multiple parents.

ADD COMMENT
0
Entering edit mode

The issue is with tools that require GTF but not GFF3.

ADD REPLY

Login before adding your answer.

Traffic: 2937 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6