Question

Normalization of RNA-seq data

0

Entering edit mode

6.6 years ago

Javad ▴ 150

Hello,

I have few libraries that are not rRNA depleted and in each of them we have high percentage of reads that are aligned to rRNA genes. I want to normalize data in order to perform multiple comparisons to find differentially expressed genes.

My question is that if I normalize data with methods like "estimateSizeFactorsForMatrix" from DESeq, does this high percentage of rRNA gene distort normalization?

Do I have to remove reads aligned to rRNA?

what is the best approach to tackle this problem?

Thanks,

RNA-Seq • 1.9k views

ADD COMMENT • link updated 6.6 years ago by Satyajeet Khare ★ 1.6k • written 6.6 years ago by Javad ▴ 150

score 0 · Answer 1 · 2017-09-13

0

Entering edit mode

6.6 years ago

Satyajeet Khare ★ 1.6k

If your libraries are not rRNA depleted, they will be of little use for differential expression irrespective of which method of normalisation you use. As we know, large chunk (>95%) of RNA inside a cell is rRNA. If you do not deplete it or if removal is not efficient, most of the reads from sequencing will be wasted on rRNA loci. As a result coverage of mRNA coding genes will be minimal. I have seen some samples with rRNA contamination and the number of reads were not sufficient enough to identify a knockout from wild type looking at reads aligned to the deleted locus, let alone the differential expression analysis.

ADD COMMENT • link 6.6 years ago by Satyajeet Khare ★ 1.6k

0

Entering edit mode

Thanks for your answer. My libraries are sequenced deep enough and I have sufficient number of reads that are aligned to mRNAs. Of course around 90 percent of my reads are aligned to rRNA but I still have around 2 or 3 million reads that are uniquly aligned to mRNAs and I think this amount is sufficient for downstream analysis. In your opinion which approach should I take for data normalization. The library sizes in my data are different and the percentage of reads aligned to rRNA is also different (between 60 to 90 percent). So I can not just remove rRNA content and then normalize with regular methods. What is your suggestion? Thanks again

ADD REPLY • link 6.6 years ago by Javad ▴ 150

1

Entering edit mode

Yeah, I was going to add "unless your RNA-Seq is deep sequenced" to my answer :)

I have not faced such situaltion, but if you mask rRNA regions and run DESeq, it might work.

Best,

ADD REPLY • link 6.6 years ago by Satyajeet Khare ★ 1.6k