Question: Filtering rRNA from RNAseq data
0
gravatar for rsafavi
21 months ago by
rsafavi50
rsafavi50 wrote:

Hi everyone,

I am trying to see if my RNAseq fastq file has any rRNA contamination. The reason I think we might have contaminations is because my GC content plot from fastqc has many peaks, and it basically fails. I used sortmerna with their default euk rRNA database, and I found about 18 gig out of 25 gig reads to be rRNA. I am not sure if this is right, since my data is from mouse, but the database is for all euk. I would very much appreciate if someone can point me to where I can find mouse rRNA database? Would it be enough to use gencode annotation file to filter out rRNA annotations, and extract the corresponding fasta file of those annotation and use that as the database?

Thanks!

fastqc rna-seq sortmerna rrna • 2.4k views
ADD COMMENTlink modified 21 months ago • written 21 months ago by rsafavi50

Using blast you can search your data against to mouse rRNA sequences. for this you can use remote option of blast with species option. You can specify mouse in the species option of blast.

ADD REPLYlink written 21 months ago by Mehmet510

Thank you, I will try that

ADD REPLYlink written 21 months ago by rsafavi50

You can normally align the reads against genome + annotation (gencode), using STAR for example which can count the number of reads/feature. Then just check the annotated rRNA gene count percentage. You would need "good" samples as a control of what is normal/bad.

ADD REPLYlink written 21 months ago by tiago2112871.2k
1

rRNA can cause to low mapping ratio of reads to genome.

ADD REPLYlink written 21 months ago by Mehmet510
3
gravatar for rsafavi
21 months ago by
rsafavi50
rsafavi50 wrote:

RSeQC tool works with the bam file. It takes in an rRNA bed file and alignment file and it splits your reads into rRNA aligned reads, reads that did not align to rRNA, and qcfailed,unmapped reads. Still, at the level of fastq file, I think sortmerna is a good tool, but it requires rRNA fasta

ADD COMMENTlink written 21 months ago by rsafavi50

Hi rsafavi, How to prepare the rRNA fasta for metagenomics? Thanks.

ADD REPLYlink written 9 weeks ago by Shicheng Guo8.1k
2
gravatar for genomax
21 months ago by
genomax80k
United States
genomax80k wrote:

You can find the mouse rDNA repeat unit sequence here. You can then use bbsplit.sh from BBMap suite to separate your reads into bins (A: Tool to separate human and mouse ran seq reads ). If you don't have genome sequence without the rDNA repeat then you could use bbmap.sh with rDNA reference and capture reads that don't match by using the outu= option.

ADD COMMENTlink modified 21 months ago • written 21 months ago by genomax80k
0
gravatar for igor
21 months ago by
igor9.8k
United States
igor9.8k wrote:

There are some related suggestions in this previous thread: RNA-seq rRNA contamination

ADD COMMENTlink written 21 months ago by igor9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1118 users visited in the last hour