Question: RNASeq normalization using the expression values of rRNA
0
gravatar for samuelmiver
5.0 years ago by
samuelmiver430
Centre for Genomic Regulation (Barcelona, Spain)
samuelmiver430 wrote:

I have a set of four samples coming from an RNASeq experiment. Normally, I would perform a quantile normalization but in this case the differential condition displace too much the expression distribution compared with the control and I do not want to lose  information.

I was wondering if it would be possible to normalize by the values of rRNA and how this should be done.

Thank you in advance for your consideration,

S.

 

rna-seq • 2.7k views
ADD COMMENTlink modified 5.0 years ago by Carlo Yague5.0k • written 5.0 years ago by samuelmiver430

A you sure you didn't sequence primarily mRNA? You normally have to explicitly ask for total RNAseq. Were there spike-ins?
 

ADD REPLYlink written 5.0 years ago by Devon Ryan96k

There are plenty of alternatives to quantile normalization.  Are any of those applicable?

ADD REPLYlink written 5.0 years ago by Sean Davis26k

You haven't specified what workflow you're attempting to use? - DESeq2? Kallisto / Sleuth? Tuxedo?

ADD REPLYlink written 5.0 years ago by andrew.j.skelton736.0k

It is a hard-coding pipeline used in the lab where I am a new student. The data was analysed by a previous technician who is not here any more. His colleagues need to reproduce the results and I do not know how to manage this mess... sorry.

ADD REPLYlink written 5.0 years ago by samuelmiver430

Well what sorts of programs are invoked in the pipeline? - Tophat? cufflinks? htseq_count? 

ADD REPLYlink written 5.0 years ago by andrew.j.skelton736.0k
5
gravatar for Carlo Yague
5.0 years ago by
Carlo Yague5.0k
Canada
Carlo Yague5.0k wrote:

As leipinji said, you shoudln't use rRNA as normalizers because library construction usually includes a step that either select mRNA or deplete rRNA. The efficiency of this step can vary, therefore rRNA levels can be very different between libraries (especially if you used ribodepletion).

However, in the case of ribodepleted datasets, you could try to normalize on snoRNAs. With DEseq2, its pretty simple :

1) get read count per gene/exons (with HTseq-counts or featureCounts for instance)

2) Perform differential analysis using DEseq2. Follow procedure described in the documentation except that you will estimate the size factor (= how you will normalize your libraries) only on read counts from snoRNA genes :

cds = newCountDataSet(CountTable, Design$condition )
estimateSizeFactors(cds[which(cds$feature=="snoRNA"])
sizeFactors( cds )
head(counts( cds, normalized=TRUE))

Hope that helps !

Carlo

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Carlo Yague5.0k

Do you have any experience with this in polyA enriched datasets? I'm curious how well it might work there given that polyA tagging of snoRNAs tends to signal "degrade me".
 

ADD REPLYlink written 5.0 years ago by Devon Ryan96k

You are right, I wouldn't recommend this method for poly-A enriched datasets. I know guys who use it, but only with ribodepleted datasets. I updated my answer accordingly to avoid confusion.

ADD REPLYlink written 5.0 years ago by Carlo Yague5.0k
0
gravatar for Pinji Lei
5.0 years ago by
Pinji Lei10
USA/Boston/Massachusetts General Hospital
Pinji Lei10 wrote:

I think you can not use rRNA for normalization. Firstly, for RNASeq library construction we first purify mRNA from total RNA and then sequencing those mRNA fragments. And the purification step will remove most rRNA. Secondly, if you used total RNA for library construction, the rate of mRNA is only 2%, you may get many useless information. I think using RPKM value is better.

ADD COMMENTlink written 5.0 years ago by Pinji Lei10
1

Please don't use RPKMs for anything other than things like plotting.
 

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Devon Ryan96k

But many people use FPKM for RNASeq gene expression level quantification. Could you please explain why shouldn't we use FPKM value for normalization?

ADD REPLYlink written 5.0 years ago by Pinji Lei10
2

Just because many people do it doesn't mean that their results are worthwhile. The short list of problems with FPKM are (1) non-robust normalization even within typical conditions and (2) complete loss of precision information. Some tools (e.g. Kallisto) will give confidence intervals on FPKM estimates, which are then useful, but this is more the exception than the rule (particularly since tools intended to deal with this have only existed for a few weeks).

ADD REPLYlink written 5.0 years ago by Devon Ryan96k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 960 users visited in the last hour