5.9 years ago by
If the intention is to check the validity or robustness of your results with respect to different normalization methods, you could do the following:
- calculate Spearman's rank correlation between fold changes reported by DEseq and RPKM fold change
- take some top 100, 1000 etc. lists and check if they significantly overlapping doing a hypergeometric test for sets ranked by different unit (DEseq fold change, RPKM fold change)
- get the library size normalized expression values per library and compare with RPKM per library. cluster the correlation matrix. Do samples cluster 'correctly' based on replicates?
- do an MDS plot of the library size normalized abundances and RPKM, do the replicates group 'correctly'?
There have been many concerns raised against the use of transcript-length normalization FPKM/RPKM as a unit to report gene abundance, some found on BioStar as well. As a consequence, I would not use FPKM as a unit to report expression levels, but abundances and library size normalized fold changes (for partial reports where only a subset of genes is reported), readers could length normalize abundances afterwards if they really wanted to, while it is difficult to transform FPKM back to abundance afterwards.
For reports containing all genes, reporting the raw counts is probably best, then the user can choose the methods for computing abundance, fold change and test freely.