CPM or TPM in GSEA analysis
3.0 years ago
francesca3 ▴ 80

Dear all, I have a question. It is better to perform this analysis using CPM or TPM values? Other times I performed this analysis using TPM. This time I have the replicate values only for CPM, while for TPM I have only the mean for each condition. Which is the best way?

Thanks Francesca

RNA-Seq GSEA • 4.5k views
Which tool are you using? When I use FGSEA, I pass in a differential-expression summary statistic, rather than values from individual samples. For a given gene, the statistic I use is (-log10 of unadjusted p-value) * (sign of differential expression coefficient).

As others have mentioned TPM is generally better - however the choice may be tool-specific.

For example, you cannot use TPM normalised counts for differential gene expression analysis with EdgeR.

9 weeks ago
dare_devil ★ 1.7k

As a input to the Broad Institute's GSEA program, one should use any type of expression data which is properly normalised such that cross-sample differences can be faithfully gauged.

That means using any of these:

1. normalised RNA-seq counts via DESeq2's 'geometric' normalisation, EdgeR's TMM method, etc
2. normalised + transformed RNA-seq expression levels, such as variance-stabilised (vst) or regularised log (rlog) expression levels from DESeq2, or log2 CPMs from EdgeR
3. normalised microarray data via RMA, GC-RMA, MAS5, neqc, etc

And one should not use raw counts or any of these types of expression levels: FPKM, RPKM, TPM etc.

Why would FPKM/RPKM/TPM not be acceptable? I would find those superior to using CPM.

Try to understand the difference between across samples / between samples and within samples while comparison

This is incorrect since GSEA uses the gene expression ranks so you should use the normalized counts (e.g. TPM).

Simply making a comment incorrect do not make your statement valid. Give enough proofs to stand with your comment. How do TPM deals with cross-sample differences?

3.0 years ago

For most intents and purpose TPM is superior to CPM (at least for short read RNA-seq; where the length of a given transcript affects the number of reads produced from said transcript). However, CPM should give comparable results to using TPM. In your case, I would try both and compare the results.

3.0 years ago

Use TPM. CPM are not normalized for gene length length meaning longer genes will appear more highly expressed when using CPM (and vise versa).