RNASeq normalization using the expression values of rRNA
2
0
Entering edit mode
7.0 years ago
samuelmiver ▴ 440

I have a set of four samples coming from an RNASeq experiment. Normally, I would perform a quantile normalization but in this case the differential condition displace too much the expression distribution compared with the control and I do not want to lose information.

I was wondering if it would be possible to normalize by the values of rRNA and how this should be done.

S.

RNA-Seq • 3.7k views
0
Entering edit mode

A you sure you didn't sequence primarily mRNA? You normally have to explicitly ask for total RNAseq. Were there spike-ins?

0
Entering edit mode

There are plenty of alternatives to quantile normalization. Are any of those applicable?

0
Entering edit mode

You haven't specified what workflow you're attempting to use? - DESeq2? Kallisto / Sleuth? Tuxedo?

0
Entering edit mode

It is a hard-coding pipeline used in the lab where I am a new student. The data was analysed by a previous technician who is not here any more. His colleagues need to reproduce the results and I do not know how to manage this mess... sorry.

0
Entering edit mode

Well what sorts of programs are invoked in the pipeline? - Tophat? cufflinks? htseq_count?

5
Entering edit mode
7.0 years ago

As leipinji said, you shoudln't use rRNA as normalizers because library construction usually includes a step that either select mRNA or deplete rRNA. The efficiency of this step can vary, therefore rRNA levels can be very different between libraries (especially if you used ribodepletion).

However, in the case of ribodepleted datasets, you could try to normalize on snoRNAs. With DEseq2, its pretty simple:

1. Get read count per gene/exons (with HTseq-counts or featureCounts for instance)
2. Perform differential analysis using DEseq2. Follow procedure described in the documentation except that you will estimate the size factor (= how you will normalize your libraries) only on read counts from snoRNA genes:
cds = newCountDataSet(CountTable, Design$condition ) estimateSizeFactors(cds[which(cds$feature=="snoRNA"])
sizeFactors( cds )


Hope that helps!

Carlo

0
Entering edit mode

Do you have any experience with this in polyA enriched datasets? I'm curious how well it might work there given that polyA tagging of snoRNAs tends to signal "degrade me".

0
Entering edit mode

You are right, I wouldn't recommend this method for poly-A enriched datasets. I know guys who use it, but only with ribodepleted datasets. I updated my answer accordingly to avoid confusion.

0
Entering edit mode
7.0 years ago
Pinji Lei ▴ 20

I think you can not use rRNA for normalization. Firstly, for RNASeq library construction we first purify mRNA from total RNA and then sequencing those mRNA fragments. And the purification step will remove most rRNA. Secondly, if you used total RNA for library construction, the rate of mRNA is only 2%, you may get many useless information. I think using RPKM value is better.

1
Entering edit mode

Please don't use RPKMs for anything other than things like plotting.

0
Entering edit mode

But many people use FPKM for RNASeq gene expression level quantification. Could you please explain why shouldn't we use FPKM value for normalization?

2
Entering edit mode

Just because many people do it doesn't mean that their results are worthwhile. The short list of problems with FPKM are (1) non-robust normalization even within typical conditions and (2) complete loss of precision information. Some tools (e.g. Kallisto) will give confidence intervals on FPKM estimates, which are then useful, but this is more the exception than the rule (particularly since tools intended to deal with this have only existed for a few weeks).