Question: RNASeq normalization using the expression values of rRNA
0
gravatar for samuelmiver
3.6 years ago by
samuelmiver400
Centre for Genomic Regulation (Barcelona, Spain)
samuelmiver400 wrote:

I have a set of four samples coming from an RNASeq experiment. Normally, I would perform a quantile normalization but in this case the differential condition displace too much the expression distribution compared with the control and I do not want to lose  information.

I was wondering if it would be possible to normalize by the values of rRNA and how this should be done.

Thank you in advance for your consideration,

S.

 

rna-seq • 1.9k views
ADD COMMENTlink modified 3.6 years ago by Carlo Yague4.4k • written 3.6 years ago by samuelmiver400

A you sure you didn't sequence primarily mRNA? You normally have to explicitly ask for total RNAseq. Were there spike-ins?
 

ADD REPLYlink written 3.6 years ago by Devon Ryan89k

There are plenty of alternatives to quantile normalization.  Are any of those applicable?

ADD REPLYlink written 3.6 years ago by Sean Davis25k

You haven't specified what workflow you're attempting to use? - DESeq2? Kallisto / Sleuth? Tuxedo?

ADD REPLYlink written 3.6 years ago by andrew.j.skelton735.6k

It is a hard-coding pipeline used in the lab where I am a new student. The data was analysed by a previous technician who is not here any more. His colleagues need to reproduce the results and I do not know how to manage this mess... sorry.

ADD REPLYlink written 3.6 years ago by samuelmiver400

Well what sorts of programs are invoked in the pipeline? - Tophat? cufflinks? htseq_count? 

ADD REPLYlink written 3.6 years ago by andrew.j.skelton735.6k
3
gravatar for Carlo Yague
3.6 years ago by
Carlo Yague4.4k
Belgium
Carlo Yague4.4k wrote:

As leipinji said, you shoudln't use rRNA as normalizers because library construction usually includes a step that either select mRNA or deplete rRNA. The efficiency of this step can vary, therefore rRNA levels can be very different between libraries (especially if you used ribodepletion).

However, in the case of ribodepleted datasets, you could try to normalize on snoRNAs. With DEseq2, its pretty simple :

1) get read count per gene/exons (with HTseq-counts or featureCounts for instance)

2) Perform differential analysis using DEseq2. Follow procedure described in the documentation except that you will estimate the size factor (= how you will normalize your libraries) only on read counts from snoRNA genes :

cds = newCountDataSet(CountTable, Design$condition )
estimateSizeFactors(cds[which(cds$feature=="snoRNA"])
sizeFactors( cds )
head(counts( cds, normalized=TRUE))

Hope that helps !

Carlo

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Carlo Yague4.4k

Do you have any experience with this in polyA enriched datasets? I'm curious how well it might work there given that polyA tagging of snoRNAs tends to signal "degrade me".
 

ADD REPLYlink written 3.6 years ago by Devon Ryan89k

You are right, I wouldn't recommend this method for poly-A enriched datasets. I know guys who use it, but only with ribodepleted datasets. I updated my answer accordingly to avoid confusion.

ADD REPLYlink written 3.6 years ago by Carlo Yague4.4k
0
gravatar for leipinji
3.6 years ago by
leipinji10
leipinji10 wrote:

I think you can not use rRNA for normalization. Firstly, for RNASeq library construction we first purify mRNA from total RNA and then sequencing those mRNA fragments. And the purification step will remove most rRNA. Secondly, if you used total RNA for library construction, the rate of mRNA is only 2%, you may get many useless information. I think using RPKM value is better.

ADD COMMENTlink written 3.6 years ago by leipinji10
1

Please don't use RPKMs for anything other than things like plotting.
 

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Devon Ryan89k

But many people use FPKM for RNASeq gene expression level quantification. Could you please explain why shouldn't we use FPKM value for normalization?

ADD REPLYlink written 3.6 years ago by leipinji10
2

Just because many people do it doesn't mean that their results are worthwhile. The short list of problems with FPKM are (1) non-robust normalization even within typical conditions and (2) complete loss of precision information. Some tools (e.g. Kallisto) will give confidence intervals on FPKM estimates, which are then useful, but this is more the exception than the rule (particularly since tools intended to deal with this have only existed for a few weeks).

ADD REPLYlink written 3.6 years ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 854 users visited in the last hour