Question: Filtering rRNA from RNAseq data
0
gravatar for rsafavi
17 months ago by
rsafavi40
rsafavi40 wrote:

Hi everyone,

I am trying to see if my RNAseq fastq file has any rRNA contamination. The reason I think we might have contaminations is because my GC content plot from fastqc has many peaks, and it basically fails. I used sortmerna with their default euk rRNA database, and I found about 18 gig out of 25 gig reads to be rRNA. I am not sure if this is right, since my data is from mouse, but the database is for all euk. I would very much appreciate if someone can point me to where I can find mouse rRNA database? Would it be enough to use gencode annotation file to filter out rRNA annotations, and extract the corresponding fasta file of those annotation and use that as the database?

Thanks!

fastqc rna-seq sortmerna rrna • 1.9k views
ADD COMMENTlink modified 17 months ago • written 17 months ago by rsafavi40

Using blast you can search your data against to mouse rRNA sequences. for this you can use remote option of blast with species option. You can specify mouse in the species option of blast.

ADD REPLYlink written 17 months ago by Mehmet500

Thank you, I will try that

ADD REPLYlink written 17 months ago by rsafavi40

You can normally align the reads against genome + annotation (gencode), using STAR for example which can count the number of reads/feature. Then just check the annotated rRNA gene count percentage. You would need "good" samples as a control of what is normal/bad.

ADD REPLYlink written 17 months ago by tiago2112871.1k
1

rRNA can cause to low mapping ratio of reads to genome.

ADD REPLYlink written 17 months ago by Mehmet500
2
gravatar for rsafavi
17 months ago by
rsafavi40
rsafavi40 wrote:

RSeQC tool works with the bam file. It takes in an rRNA bed file and alignment file and it splits your reads into rRNA aligned reads, reads that did not align to rRNA, and qcfailed,unmapped reads. Still, at the level of fastq file, I think sortmerna is a good tool, but it requires rRNA fasta

ADD COMMENTlink written 17 months ago by rsafavi40
1
gravatar for genomax
17 months ago by
genomax75k
United States
genomax75k wrote:

You can find the mouse rDNA repeat unit sequence here. You can then use bbsplit.sh from BBMap suite to separate your reads into bins (A: Tool to separate human and mouse ran seq reads ). If you don't have genome sequence without the rDNA repeat then you could use bbmap.sh with rDNA reference and capture reads that don't match by using the outu= option.

ADD COMMENTlink modified 17 months ago • written 17 months ago by genomax75k
0
gravatar for igor
17 months ago by
igor8.9k
United States
igor8.9k wrote:

There are some related suggestions in this previous thread: RNA-seq rRNA contamination

ADD COMMENTlink written 17 months ago by igor8.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1984 users visited in the last hour