Question: normalization RNA-seq data between different types of libraries
gravatar for ktian.whu
5.0 years ago by
ktian.whu30 wrote:

Dear members,

I have several RNA-seq data sets from published articles. We think integration of them may be interesting. But they come from different types of libraries (single-end/paried-end, PolyA+/rRNA depletion).How can I normalize them and in this way, compare their expression directly? 

I read an article using different types of data just normalizing them by RPKM, is this a good method?

rna-seq fpkm rpkm • 4.4k views
ADD COMMENTlink modified 4.9 years ago by Floris Brenk890 • written 5.0 years ago by ktian.whu30

I really wonder why these many RPKM/FPKM questions come up lately.

ADD REPLYlink written 5.0 years ago by Michael Dondrup46k
gravatar for Michael Dondrup
5.0 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

FPKM/RPKM is not a normalization method. It is a unit, meant for representing some sort of molar concentration of transcripts.

In my experience, if you have very different libraries, FPKM will confound biological comparability instead of making it more comparable. Calculating FPKM (imho) does not remove technical bias, but adds an unknown proportionality factor to each library, which is based on library composition. Therefore if you have very different protocols it will not help to length-scale abundances. Problems of using FPKM have been stated in several posts, e.g.:

FPKM not suitable for DE? Does FPKM scale incorrectly in case of unequal mapping rates? How to calculate FPKM values of interested gene DEseq differential expression and RPKM

Some more problems:

  • FPKM/RPKM has never been peer-reviewed, it has been introduced as an ad-hoc measure in a supplementary
  • One of the authors of this paper states, that it should not be used because of faulty arithmetic
  • All reviews so far have shown it to be an inferior scale for DE analysis of genes  
  • Length normalization is mostly dispensable imo in DE analysis because gene length is constant

Using FPKM scaled values for your use-case might be the worst possible application.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Michael Dondrup46k

Would it be right if we quantile normalize the dataframe containing FPKM values of genes obtained from Cufflinks, where samples in dataframe had different types of libraries?

ADD REPLYlink written 4.8 years ago by Manvendra Singh2.1k

From my data, it doesn't seem that quantile normalization cures it. Of course this is a very limited view. I can only say that observations on our data correlate with the published concerns, and we seem to exactly observe the library dependent bias that was mentioned. In principle all points I made above apply even more for quantile normalized FPKM (not peer reviewed combination of two methods, etc.).

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Michael Dondrup46k

Thanks for sharing this. It helps.

ADD REPLYlink written 4.7 years ago by Manvendra Singh2.1k
gravatar for Ming Tang
5.0 years ago by
Ming Tang2.5k
Houston/MD Anderson Cancer Center
Ming Tang2.5k wrote:

If you want to get differentially expressed genes, you need to control the library type as a factor when you use DESeq or limma.

you may also use sva to remove batch effect

ADD COMMENTlink written 5.0 years ago by Ming Tang2.5k
gravatar for Floris Brenk
4.9 years ago by
Floris Brenk890
Floris Brenk890 wrote:

When combining these datasets please take extreme caution with results interpretation and design your analysis in such a way that you exclude bias effect.

For more information about the biases in library prep read this:

Genome Biol. 2014 Jun 30;15(6):R86. [Epub ahead of print]
IVT-seq reveals extreme bias in RNA-sequencing.

Lahens NF, Kavakli IH, Zhang R, Hayer K, Black MB, Dueck H, Pizarro A, Kim J, Irizarry R, Thomas RS, Grant GR, Hogenesch JB.  

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Floris Brenk890
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1176 users visited in the last hour