Filtering rRNA from RNAseq data
3
0
Entering edit mode
3.4 years ago
rsafavi ▴ 60

Hi everyone,

I am trying to see if my RNAseq fastq file has any rRNA contamination. The reason I think we might have contaminations is because my GC content plot from fastqc has many peaks, and it basically fails. I used sortmerna with their default euk rRNA database, and I found about 18 gig out of 25 gig reads to be rRNA. I am not sure if this is right, since my data is from mouse, but the database is for all euk. I would very much appreciate if someone can point me to where I can find mouse rRNA database? Would it be enough to use gencode annotation file to filter out rRNA annotations, and extract the corresponding fasta file of those annotation and use that as the database?

Thanks!

RNA-Seq rRNA fastqc sortmerna • 5.5k views
0
Entering edit mode

Using blast you can search your data against to mouse rRNA sequences. for this you can use remote option of blast with species option. You can specify mouse in the species option of blast.

0
Entering edit mode

Thank you, I will try that

0
Entering edit mode

You can normally align the reads against genome + annotation (gencode), using STAR for example which can count the number of reads/feature. Then just check the annotated rRNA gene count percentage. You would need "good" samples as a control of what is normal/bad.

2
Entering edit mode

rRNA can cause to low mapping ratio of reads to genome.

4
Entering edit mode
3.4 years ago
rsafavi ▴ 60

RSeQC tool works with the bam file. It takes in an rRNA bed file and alignment file and it splits your reads into rRNA aligned reads, reads that did not align to rRNA, and qcfailed,unmapped reads. Still, at the level of fastq file, I think sortmerna is a good tool, but it requires rRNA fasta

0
Entering edit mode

Hi rsafavi, How to prepare the rRNA fasta for metagenomics? Thanks.

3
Entering edit mode
3.4 years ago
GenoMax 109k

You can find the mouse rDNA repeat unit sequence here. You can then use bbsplit.sh from BBMap suite to separate your reads into bins (A: Tool to separate human and mouse ran seq reads ). If you don't have genome sequence without the rDNA repeat then you could use bbmap.sh with rDNA reference and capture reads that don't match by using the outu= option.

0
Entering edit mode
3.4 years ago
igor 12k

There are some related suggestions in this previous thread: RNA-seq rRNA contamination