Question: Using Multiple Parent Values In Gff3 Format?
10.3 years ago by
In the GFF3 format, is it allowed to have multiple Parent values per "exon"? E.g. entries like

chr1 ... exon ... ID=myExon;Parent=mRNA1,mRNA2,mRNA3

Is this legitimate?


I'm assuming that this exon is included in multiple splice variants of an alternatively spliced gene? Most examples I have seen, they will create an additional 'exon' line in the GFF file for each transcript of which the exon is a part. I don't know if that is the correct way, nor do I know whether the usage you provided is legitimate.

In fact, I've been frustrated the last two weeks at how different tools handle the attributes differently. I've been using the page at as my reference. Are there additional references out there?

10.3 years ago by
Yes, this is permitted. See the section of the GFF3 spec entitled "The canonical Gene" where this notation is used:

ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN

ctg123 . mRNA 1050 9000 . + . ID=mRNA00001;Parent=gene00001;Name=EDEN.1
ctg123 . mRNA 1050 9000 . + . ID=mRNA00002;Parent=gene00001;Name=EDEN.2
ctg123 . mRNA 1300 9000 . + . ID=mRNA00003;Parent=gene00001;Name=EDEN.3

ctg123 . exon 1300 1500 . + . ID=exon00001;Parent=mRNA00003
ctg123 . exon 1050 1500 . + . ID=exon00002;Parent=mRNA00001,mRNA00002
ctg123 . exon 3000 3902 . + . ID=exon00003;Parent=mRNA00001,mRNA00003
ctg123 . exon 5000 5500 . + . ID=exon00004;Parent=mRNA00001,mRNA00002,mRNA00003
ctg123 . exon 7000 9000 . + . ID=exon00005;Parent=mRNA00001,mRNA00002,mRNA00003
Yah. The example of "The canonical Gene" is a nice one. but I am little bit confused about exon00003; why its parent is mRNA00001,mRNA00003 only? why not mRNA00002 since mRNA00002 also starts from 1050 and exon00003 from 3000?


