Question: Can we do between-sample normalization like deseq/TMM on RPKM?
gravatar for jerryzhaosjtu
3.0 years ago by
jerryzhaosjtu0 wrote:

Hi fellow biostars,

I have RPKM for 20 samples(10 disease and 10 control). I'm trying to build a classification model, so I need to do between-sample normalization. I know that I could do deseq/TMM on raw read counts, but is it okay to do deseq/TMM on RPKM? Thanks a lot!

Update: I found that deseq doesn't support continuous data like RPKM. So I guess my question is: is there any between-sample normalization methods that I could use on RPKM data? Thanks!

rna-seq R genome • 1.4k views
ADD COMMENTlink modified 3.0 years ago by agoel30 • written 3.0 years ago by jerryzhaosjtu0

The problem is not the normalization, you can normalize RPKM in the same method DESeq uses. The DESeq model assumes the input is number of reads, it's pretty useless without it. Can't you get the raw reads?

ADD REPLYlink written 3.0 years ago by Asaf7.0k
gravatar for agoel
3.0 years ago by
agoel30 wrote:

RNA-seq expression values can be enumerated by two methods - Count-based or, RPKM/ FPKM based

In the case of the latter, the values are estimates and hence in decimals. Since they are in decimals, robust statistical procedures aren't available to them. Whereas, Count-based data are whole numbers (since they are count of read-depths) and rigorous statistical procedures (like DESEq, EdgeR, Limma/ Voom etc.) are available.

In summary, to perform a proper cross-sample normalisation, that will remove batch-effects if present, Count-based data would be needed.

ADD COMMENTlink written 3.0 years ago by agoel30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1360 users visited in the last hour