Question: Can We Use Rnaseq Fpkm Values Derived From 18 Cellines And Run Gsea Analysis On It?
5.7 years ago
I have 2 questions:

1) I have FPKM values for each cell line. How do i further normalize the results so that i can compare across cell lines. Do i do a zscore/robust zscore for each celline? 2) Has anyone ran GSEA software from the broad institute n RNAseq data and if yes, then is the input to GSEA the FPKM values?

5.7 years ago
Damian Kao
To answer your first question, FPKM is already normalized in the sense that you've already divided by the total library size (and also by transcript length). Whether that's a good way of normalizing your reads is questionable. If you are comparing among cell lines, you really don't need to divide by transcript length as it is a technical bias that should be consistent in all your samples.

Other popular options would be to normalize your reads with DESeq's method or EdgeR's TMM method. Here is a good paper that describes several normalization methods:

Converting your normalized expression value to z-socre can be useful if you want to generate a nice heatmap or perform cluster analysis. However, you will lose information on the magnitude of gene expression with z-score. A gene going from 100 reads to 500 reads will have the same z-score as a gene going from 1000 reads to 5000 reads. Another option is to use variance stabilization method from the DESeq package.

