Gencode V15 Exons < 3Bp
2
1
Entering edit mode
11.2 years ago
PoGibas 5.1k

After checking exons length from the gencode.v15.annotation I have noticed that there are exons only 1bp or 2bp in length.

curl -s "ftp://ftp.sanger.ac.uk/pub/gencode/release_15/gencode.v15.annotation.gtf.gz" | 
     gunzip -c | 
     awk '($3=="exon" && $5-$4+1 < 3) {print}'

Thats strange as some exons (protein coding or lncRNA are only 1bp or 2bp long). Is it bioinformatics or (probably not) biology? Has anyone ever noticed something like that with different annotation?

encode • 2.8k views
ADD COMMENT
1
Entering edit mode

The coordinates in a gtf are inclusive. So you should $5 - $4 + 1. So the lengths are actually 1. Still pretty weird that you get exon length of 1 though.

ADD REPLY
0
Entering edit mode

Some non-coding RNAs shared in protein coding genes are marked with 0 or 1 length.

ADD REPLY
0
Entering edit mode

Thanks, fixed it.

ADD REPLY
3
Entering edit mode
10.4 years ago
PoGibas 5.1k

I have contacted and asked Gencode staff about this issue (in February). They have answered and hoped that problem will be fixed until Gencode.v16.

Apparently there was a bug in one of their scripts.
"... there should be no exons in Gencode <3bp. Alignments of <3bp can not be trusted, even when spanning known splice junctions, or confirming known UTRs/retained introns".

Current Gencode annotation (v18) still have this problem (don't know why they haven't fixed it yet).
I would suggest filtering those exons out.

ADD COMMENT
0
Entering edit mode
10.2 years ago
Emily 23k

Here's what Laurens says now:

I had a look at a couple of examples:

  • OTTHUMT00000321563 has 2bp first (coding) exon because it is 5' incomplete and those two bases align to a reference exon. Though arguably they could also align to the exon before that and other more upstream exons. I have now deleted that exon.
  • OTTHUMT00000470867 doesn't have 1 bp exon in our internal database any more, it's 227 long now. So that should be in a future Ensembl update.

I will go through the short-exon list from Gencode v18 and fix where necessary.

ADD COMMENT

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6