rRNA in human
1
2
Entering edit mode
6.7 years ago

I am working on human RNAseq data. I am trying to build rRNA database in order to remove the rRNA contamination. I explored various databases such as Silva rRNA database, UCSC browser ( to get rRNA gene_type), ensemble biomart. In Silva database, I found following count of rRNA sequences

LUS128 databset

Silva - 3198 Silva ref - none EMBL - 104 RDP - none

SSU128 dataset

Silva - 2662 (human + other organisms) silva Ref - 1999 (human + other) Silva Ref NR - 353 (human _ref) (NR must defines non redundant dataset). Greengenes - none RDP - none

I downloaded all these dataset and end up with approx 1500 sequence (removed duplicated sequence)

On the other hand, from UCSC browser , I found list of approx 560 rRNA sequence.

Can anybody suggest me which set I should consider for next step i.e sortmeRNA database construction in order to remove rRNA contamination from human RNAseq).

I will appreciate all suggestions.

RNA-Seq • 5.1k views
ADD COMMENT
1
Entering edit mode

Also see this thread http://seqanswers.com/forums/showthread.php?t=41868

You maybe able to get rRNA from GENCODE too (biotype = rRNA) https://www.gencodegenes.org/gencode_biotypes.html

ADD REPLY
0
Entering edit mode

Hello santosh Anand,

I worked on Rfam database and found following entries of rRNA 5s - 615 (human filteration using RF00001 expert_db:"Rfam" AND TAXONOMY:"9606" AND rna_type:"rRNA" LSU and 5.8 - 707 (human filtration using RF02543 expert_db:"Rfam" AND TAXONOMY:"9606" AND rna_type:"rRNA") SSU - 558 (human filteration using RF01960 expert_db:"Rfam" AND TAXONOMY:"9606" AND rna_type:"rRNA") tRNA - 994 (human filteration using RF00005 expert_db:"Rfam" AND TAXONOMY:"9606" AND rna_type:"tRNA")

I will merge all these fasta files and create database to remove the rRNA and tRNA contamination from RNAseq reads. But I have one more doubt, In GtRNAdb available at http://gtrnadb.ucsc.edu/, count of tRNA of human dataset is 610. Now why this difference in tRNA count ? Now, which step will be good choice 1. selection of RFAM rRNA dataset ? 2. selection of rRNA + tRNA present in gtf file ? As many studies have reported the use of Rfam database for such analysis.

Thanks in advance

ADD REPLY
0
Entering edit mode

Is it possible to get the GTF file with annotations of individual gene IDs for 28s, 5.8s, 5s (LSU) and 18s (SSU)? Reads distributed between LSU and SSU are not giving information about reads belonging to 28s, 5.8s, 5s, 18s separately. I extracted GTF file from here: http://genome.ucsc.edu/cgi-bin/hgTables

ADD REPLY
0
Entering edit mode

We don't know what the aim of your analysis is, but chances are you don't have to remove the rRNA at all.

ADD REPLY
0
Entering edit mode
6.7 years ago
GenoMax 141k

See C: Removing rRNA and tRNA sequences using GTF files for human rDNA repeat. You can use it with BBsplit from BBMap suite (A: Tool to separate human and mouse ran seq reads ) to bin/remove ribosomal RNA reads.

ADD COMMENT

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6