Gff3 Format: Two Introns With Two Different Parent Mrnas But Only One Defined
2
0
Entering edit mode
13.0 years ago
Arun 2.4k

Hi, I am a bit confused with this new gff3 file I am working on. The older one did not have UTRs. I am not able to follow some entries in the gff3 file. Bear with me if its silly! :)

The entry for one particular gene looks like this (I deleted other other entries that are unnecessary for my question): I used these abbrevations for easier reading. 3'UTR = threeprimeUTR 5'UTR = fiveprimeUTR

ch00 gene   8835428 883785  ID=gene:c00g009140.2  
ch00 mRNA   8834528 8837855 ID=mRNA:c00g009140.2.1;Parent=gene:c00g009140.2  
ch00 exon   8835428 8835483 ID=exon:c00g009140.2.1.1;Parent=mRNA:c00g009140.2.1  
ch00 CDS    8835428 8835483 ID=CDS:c00g009140.2.1.1;Parent=mRNA:c00g009140.2.1   
ch00 intron 8835484 8835596 ID=intron:c00g009130.2.1.3;**Parent=mRNA:c00g009130.2.1**  
ch00 intron 8835484 8835596 ID=intron:c00g009140.2.1.1;**Parent=mRNA:c00g009140.2.1**  
ch00 exon   8835597 8835646 ID=exon:c00g009130.2.1.4;**Parent=mRNA:c00g009130.2.1**  
ch00 3'UTR  8835597 8835646 ID=3'UTR:c00g009130.2.1.1;**Parent=mRNA:c00g009130.2.1**  
ch00 exon   8835597 8835650 ID=exon:c00g009140.2.1.2;Parent=mRNA:c00g009140.2.1

and so on... My question is that, from the GFF3 format specification, the parent must also be declared, i.e another mRNA having this ID c00g009130.2.1. If so, then are these entries with errors? (This gff3 file is a pre-release as well). If not, could you please explain the logic behind? In addition to the introns, there are also UTRs and exons with another parent. Is this an overlapping gene? I don't really follow what an overlapping gene means as well.. It would be great if someone could explain.

Thank you,
Arun.

gff • 3.4k views
ADD COMMENT
3
Entering edit mode
13.0 years ago
Pablo ★ 1.9k

In my experience most GFF3 files do not follow the GFF3 specification. Actually ENSEMBL is not releasing GFF3 files any more (they are using GTF2.2 which is a similar format, but more strict).

In your case, you have a Gene (gene:c00g009140.2). The gene has two transcripts (mRNA:c00g009140.2.1 and mRNA:c00g009130.2.1), but you don't show the part where they define the second one (there should be another mRNA line).

Genes can have several transcripts. It is not unusual that only one of them is marked as "protein coding transcript" (so it has 3'UTR and 5'UTR defined), and the other transcripts are not.

ADD COMMENT
2
Entering edit mode

Hi Pablo, the fact that some people write GFF3-like files hardly can be held against the GFF3 specification, which is quite clearly defined: http://www.sequenceontology.org/gff3.shtml While the GTF2.2 specification is clearly defined, it is also considerably more constrained with respect to what it can represent.

ADD REPLY
0
Entering edit mode

In addition, sometimes you can infer 3'UTR and 5'UTR from the difference between CDS and exon marks.

ADD REPLY
0
Entering edit mode

Hi Pablo, Thanks for your reply. I just checked again and the other transcript was present as well. My question was about the absence of other transcript in parent definition actually. Thanks again!

ADD REPLY
3
Entering edit mode
13.0 years ago
Scott Cain ▴ 770

Hi Arun,

You might want to run your GFF3 through a GFF3 validator to see if it's OK:

http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online

I see from your followup that apparently missing transcripts are really present so hopefully that part makes sense to you. One thing that I think is clearly wrong is the parentage of the intron: since an mRNA is a processed transcript, it can't have any introns associated with it. Put another way: having an intron as a child of mRNA violates the Sequence Ontology (which GFF3 has to adhere to). My preference would be to leave introns out altogether, as their presence can easily be inferred from the existence of the UTR and CDS or exon regions. If they need to be in the GFF for some reason, they can be children of genes or of a primary transcript feature.

Scott

ADD COMMENT

Login before adding your answer.

Traffic: 2810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6