Question: Normalization of RNA-seq data
12 days ago by
Javad0 wrote:


I have few libraries that are not rRNA depleted and in each of them we have high percentage of reads that are aligned to rRNA genes. I want to normalize data in order to perform multiple comparisons to find differentially expressed genes.

My question is that if I normalize data with methods like "estimateSizeFactorsForMatrix" from DESeq, does this high percentage of rRNA gene distort normalization?

Do I have to remove reads aligned to rRNA?

what is the best approach to tackle this problem?


ADD COMMENTlink modified 12 days ago by Satyajeet Khare1.0k • written 12 days ago by Javad0
12 days ago by
Satyajeet Khare1.0k
Pune, India
Satyajeet Khare1.0k wrote:

If your libraries are not rRNA depleted, they will be of little use for differential expression irrespective of which method of normalisation you use. As we know, large chunk (>95%) of RNA inside a cell is rRNA. If you do not deplete it or if removal is not efficient, most of the reads from sequencing will be wasted on rRNA loci. As a result coverage of mRNA coding genes will be minimal. I have seen some samples with rRNA contamination and the number of reads were not sufficient enough to identify a knockout from wild type looking at reads aligned to the deleted locus, let alone the differential expression analysis.

ADD COMMENTlink modified 12 days ago • written 12 days ago by Satyajeet Khare1.0k

Thanks for your answer. My libraries are sequenced deep enough and I have sufficient number of reads that are aligned to mRNAs. Of course around 90 percent of my reads are aligned to rRNA but I still have around 2 or 3 million reads that are uniquly aligned to mRNAs and I think this amount is sufficient for downstream analysis. In your opinion which approach should I take for data normalization. The library sizes in my data are different and the percentage of reads aligned to rRNA is also different (between 60 to 90 percent). So I can not just remove rRNA content and then normalize with regular methods. What is your suggestion? Thanks again

ADD REPLYlink written 11 days ago by Javad0

Yeah, I was going to add "unless your RNA-Seq is deep sequenced" to my answer :)

I have not faced such situaltion, but if you mask rRNA regions and run DESeq, it might work.


ADD REPLYlink modified 11 days ago • written 11 days ago by Satyajeet Khare1.0k
