Question: Normalization of RNA-seq data
gravatar for Javad
8 months ago by
Javad0 wrote:


I have few libraries that are not rRNA depleted and in each of them we have high percentage of reads that are aligned to rRNA genes. I want to normalize data in order to perform multiple comparisons to find differentially expressed genes.

My question is that if I normalize data with methods like "estimateSizeFactorsForMatrix" from DESeq, does this high percentage of rRNA gene distort normalization?

Do I have to remove reads aligned to rRNA?

what is the best approach to tackle this problem?


rna-seq • 456 views
ADD COMMENTlink modified 8 months ago by Satyajeet Khare1.2k • written 8 months ago by Javad0
gravatar for Satyajeet Khare
8 months ago by
Satyajeet Khare1.2k
Pune, India
Satyajeet Khare1.2k wrote:

If your libraries are not rRNA depleted, they will be of little use for differential expression irrespective of which method of normalisation you use. As we know, large chunk (>95%) of RNA inside a cell is rRNA. If you do not deplete it or if removal is not efficient, most of the reads from sequencing will be wasted on rRNA loci. As a result coverage of mRNA coding genes will be minimal. I have seen some samples with rRNA contamination and the number of reads were not sufficient enough to identify a knockout from wild type looking at reads aligned to the deleted locus, let alone the differential expression analysis.

ADD COMMENTlink modified 8 months ago • written 8 months ago by Satyajeet Khare1.2k

Thanks for your answer. My libraries are sequenced deep enough and I have sufficient number of reads that are aligned to mRNAs. Of course around 90 percent of my reads are aligned to rRNA but I still have around 2 or 3 million reads that are uniquly aligned to mRNAs and I think this amount is sufficient for downstream analysis. In your opinion which approach should I take for data normalization. The library sizes in my data are different and the percentage of reads aligned to rRNA is also different (between 60 to 90 percent). So I can not just remove rRNA content and then normalize with regular methods. What is your suggestion? Thanks again

ADD REPLYlink written 8 months ago by Javad0

Yeah, I was going to add "unless your RNA-Seq is deep sequenced" to my answer :)

I have not faced such situaltion, but if you mask rRNA regions and run DESeq, it might work.


ADD REPLYlink modified 8 months ago • written 8 months ago by Satyajeet Khare1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour