Question: RNA-seq rRNA contamination
3
gravatar for igor
3.6 years ago by
igor9.9k
United States
igor9.9k wrote:

I would like to screen for rRNA contamination in my RNA-seq. I tried two different methods, which can be summed up as:

  • take human GENCODE GTF, filter for "rRNA", extract sequence for the matching coordinates
  • download Rfam FASTA file, filter for "ribosomal_rna" and "homo_sapiens" (example protocol)

I then align against each one separately. I don't expect them to yield very close results, but the difference can be 100X. Why such a big difference? Can I trust either?

One possibility is that the Rfam sequences are overestimating the abundance and I am getting a lot of false positives. However, I frequently have alignment rate of less than 5%, which is very reasonable.

rna-seq rrna • 5.4k views
ADD COMMENTlink modified 18 months ago by genomax80k • written 3.6 years ago by igor9.9k

How about using the human rDNA repeat sequence to screen against. A link for that is in this post.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by genomax80k

But then I will have three different results.

I am actually trying to make this work with Picard CollectRnaSeqMetrics, which needs an intevals file (so it has to be based on reference genome coordinates).

ADD REPLYlink written 3.6 years ago by igor9.9k

I was wondering if you figured out what the best approach is?

ADD REPLYlink written 21 months ago by rsafavi50
6
gravatar for Friederike
2.5 years ago by
Friederike5.4k
United States
Friederike5.4k wrote:

did you try sortMeRNA? The input are reads in fastq file + rRNA sequences. The tool will extract those reads that do not match to the rRNA sequences, so by quantifying how many reads you're left with, you should be able to gauge the contamination.

human rRNA, mouse rRNA

ADD COMMENTlink written 2.5 years ago by Friederike5.4k

Hi Friederike, How to prepare rRNA fasta for metagenomic RNA-seq? Thanks.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Shicheng Guo8.1k
2
gravatar for nick.a.rouse
2.5 years ago by
nick.a.rouse30
nick.a.rouse30 wrote:

A very simple approach would be to download rRNA bed coordinates (from Ensembl) and then count the total number of reads that fall into this ROI (using bedtools suite) and divide this by total number of reads in the bam (samtools idxstats). This would give you a rough estimate of your background rRNA levels. If you have many samples you can set an acceptable threshold ( <2 %).

ADD COMMENTlink written 2.5 years ago by nick.a.rouse30

Yes, that is essentially the first method I mentioned, but I am trying to reconcile it with the alternative.

ADD REPLYlink modified 18 months ago • written 2.5 years ago by igor9.9k
2
gravatar for genomax
18 months ago by
genomax80k
United States
genomax80k wrote:

igor : Since you reactivated this thread and as it has not been mentioned I will make a note of bbsplit.sh from BBMap suite. You can use it to bin the reads mapping to rDNA repeat away from your sequences.

ADD COMMENTlink modified 18 months ago • written 18 months ago by genomax80k

Just found it again and made a minor edit, which reactivated it. Thanks for the additional suggestion, though!

ADD REPLYlink modified 18 months ago • written 18 months ago by igor9.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1099 users visited in the last hour