CPM or TPM in GSEA analysis
3
0
Entering edit mode
3.0 years ago
francesca3 ▴ 80

Dear all, I have a question. It is better to perform this analysis using CPM or TPM values? Other times I performed this analysis using TPM. This time I have the replicate values only for CPM, while for TPM I have only the mean for each condition. Which is the best way?

Thanks Francesca

RNA-Seq GSEA • 4.5k views
ADD COMMENT
0
Entering edit mode

Which tool are you using? When I use FGSEA, I pass in a differential-expression summary statistic, rather than values from individual samples. For a given gene, the statistic I use is (-log10 of unadjusted p-value) * (sign of differential expression coefficient).

ADD REPLY
0
Entering edit mode

As others have mentioned TPM is generally better - however the choice may be tool-specific.

For example, you cannot use TPM normalised counts for differential gene expression analysis with EdgeR.

ADD REPLY
7
Entering edit mode
9 weeks ago
dare_devil ★ 1.7k

As a input to the Broad Institute's GSEA program, one should use any type of expression data which is properly normalised such that cross-sample differences can be faithfully gauged.

That means using any of these:

  1. normalised RNA-seq counts via DESeq2's 'geometric' normalisation, EdgeR's TMM method, etc
  2. normalised + transformed RNA-seq expression levels, such as variance-stabilised (vst) or regularised log (rlog) expression levels from DESeq2, or log2 CPMs from EdgeR
  3. normalised microarray data via RMA, GC-RMA, MAS5, neqc, etc

And one should not use raw counts or any of these types of expression levels: FPKM, RPKM, TPM etc.

ADD COMMENT
0
Entering edit mode

Why would FPKM/RPKM/TPM not be acceptable? I would find those superior to using CPM.

ADD REPLY
0
Entering edit mode

Try to understand the difference between across samples / between samples and within samples while comparison

ADD REPLY
0
Entering edit mode

This is incorrect since GSEA uses the gene expression ranks so you should use the normalized counts (e.g. TPM).

ADD REPLY
0
Entering edit mode

Simply making a comment incorrect do not make your statement valid. Give enough proofs to stand with your comment. How do TPM deals with cross-sample differences?

ADD REPLY
2
Entering edit mode
3.0 years ago

For most intents and purpose TPM is superior to CPM (at least for short read RNA-seq; where the length of a given transcript affects the number of reads produced from said transcript). However, CPM should give comparable results to using TPM. In your case, I would try both and compare the results.

ADD COMMENT
1
Entering edit mode
3.0 years ago

Use TPM. CPM are not normalized for gene length length meaning longer genes will appear more highly expressed when using CPM (and vise versa).

ADD COMMENT

Login before adding your answer.

Traffic: 1788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6