Ensembl Gtf/Gff File Misses Obvious Rrnas
2
0
Entering edit mode
10.9 years ago
Nick Crawford ▴ 210

I'm using the an ensembl gtf file vs 61 to remove rRNA from an rRNAseq dataset. Ensembl gtfs contain an rRNA annotation that makes this trivially easy to do.

import os
fin = 'mygenome.0.61.gtf'
fout = os.path.splitext(fin)[0] + 'only_rRNA.gtf'
fin = open(fin,'rU')
fout = open(fout,'w')
for count, line in enumerate(fin):
    parts = line.strip().split()
    if parts[1] != 'rRNA':
        fout.write(line)

However, after trimming my dataset of 1,980 rRNA transcripts I still find obvious rRNAs in it.

e.g.:
ENSACAG00000014849    ribosomal protein L38 (rpl38)
ENSACAG00000005015    ribosomal protein S21 (RPS21)
ENSACAG00000011604    ribosomal protein S27 (rps27)
ENSACAG00000010479    ribosomal protein S12 (Rps12)
ENSACAG00000007960    ribosomal protein S24 (Rps24)
etc.

Has anyone else had this issue? Can you suggest any work arounds. Are there better ensembl gene lists out there I could use to filter? GO terms perhaps?

rrna ensembl gene rna • 4.0k views
ADD COMMENT
3
Entering edit mode
10.9 years ago
Neilfws 49k

I think you answered your own question but just to clarify. rRNA (in eukaryotes 28S + 18S, in prokaryotes 23S + 16S) are untranslated RNAs which play a structural role in the large and small ribosomal subunits. What you have there are mRNAs encoding ribosomal proteins.

ADD COMMENT
1
Entering edit mode
10.9 years ago
Nick Crawford ▴ 210

Hmm.. I think I've figured out why the ribosomal proteins are showing up. They're not rRNAs (= rRNA are untranslated).

ADD COMMENT

Login before adding your answer.

Traffic: 2075 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6