My goal was to quantify how many ribosomal reads (5.8S) there were in my library, and use these to normalize gene expression. It makes sense in my experiment. I took 2 approaches:
1 - read count
Affter mapping with
Tophat, no multi-mappers allowed, I used
featureCounts to count reads mapping to features, and tallied up the rRNA reads as those mapping to
Using this method, there are between 10^5 to 10^6 rRNA reads for each sample.
2 - mapping to rRNA
Following these instructions, I mapped reads directly to the rRNA sequences provided in the iGenomes bundle (AbundantSequences/hum5SrDNA) using
Using this method, there are between 10^3 to 10^4 mapped rRNA reads for each sample. This is quite a difference!
Why such a difference between methods? I actually expected to get more reads when mapping directly to sequence since the multi-mapping issue is avoided.
Is any of these methods adequate for the purpose?
Which alternative method would you suggest?
I might actually not use this data, for a number of reason, but I am curios about what happened here.