Question

Can we do between-sample normalization like deseq/TMM on RPKM?

0

Entering edit mode

7.2 years ago

jerryzhaosjtu • 0

Hi fellow biostars,

I have RPKM for 20 samples(10 disease and 10 control). I'm trying to build a classification model, so I need to do between-sample normalization. I know that I could do deseq/TMM on raw read counts, but is it okay to do deseq/TMM on RPKM? Thanks a lot!

Update: I found that deseq doesn't support continuous data like RPKM. So I guess my question is: is there any between-sample normalization methods that I could use on RPKM data? Thanks!

RNA-Seq genome R • 2.6k views

ADD COMMENT • link updated 7.2 years ago by agoel ▴ 30 • written 7.2 years ago by jerryzhaosjtu • 0

1

Entering edit mode

The problem is not the normalization, you can normalize RPKM in the same method DESeq uses. The DESeq model assumes the input is number of reads, it's pretty useless without it. Can't you get the raw reads?

ADD REPLY • link 7.2 years ago by Asaf 10k

score 1 · Answer 1 · 2017-02-21

RNA-seq expression values can be enumerated by two methods - Count-based or, RPKM/ FPKM based

In the case of the latter, the values are estimates and hence in decimals. Since they are in decimals, robust statistical procedures aren't available to them. Whereas, Count-based data are whole numbers (since they are count of read-depths) and rigorous statistical procedures (like DESEq, EdgeR, Limma/ Voom etc.) are available.

In summary, to perform a proper cross-sample normalisation, that will remove batch-effects if present, Count-based data would be needed.