TPM or FPKM as input values for PCA and WGCNA?
2
2
Entering edit mode
8.3 years ago
Marion Neely ▴ 20

Would you suggest using TPM or FPKM values for PCA and WGCNA?

Thanks,
Marion

TPM RNA-Seq PCA WGCNA FPKM • 7.0k views
ADD COMMENT
0
Entering edit mode

I don't know for WGCNA, but a PCA assumes normality so you'll have to (at least) take the log transformed values wether you choose TPM or FPKM.

ADD REPLY
0
Entering edit mode

PCA assumes normality

Do you have a reference for this? I don't think PCA needs any assumption. If you have variables measured on different scales, like metres and kilograms, than it's advisable to scale and centre to remove dependency on the units of measure but this is not the case for gene expression.

ADD REPLY
0
Entering edit mode

Ok, you are right, this is not really an assumption. More of an advice to get meaningful results : gene expression has a heavily skewed distribution and PCA is quite sensitive to outliers, that is why I usually log transform expression data. For reference : http://www.bioconductor.org/help/workflows/rnaseqGene/#the-rlog-transformation

ADD REPLY
2
Entering edit mode
8.3 years ago
Rob 6.5k

Hi Marion,

For this purpose, I'd imagine you would not likely see much difference. However, there is literally no reason to prefer FPKM over TPM. If you're looking to perform some analysis where relative abundance is an appropriate measure, you should always favor TPM.

ADD COMMENT
0
Entering edit mode
8.2 years ago
Marion Neely ▴ 20

Thank you everyone for your help! I tried it both ways. The PCA from the FPKM values made the most sense and the plot was similar to previous work. When I used the TPM values the strong separation by PC1 that we had seen with FPKM and in previous analysis moved to PC2.

ADD COMMENT
0
Entering edit mode

That's interesting (i.e. the shift). However, the reason to prefer TPM over FPKM is that FPKM has a (somewhat arbitrary) dependence on the mean expressed transcript length of a samples, while TPM does not. It's probably worth checking that the separation you see in PC component was is not an artifact of this technical detail. You can calculate the different scaling factors between your samples using a method such as presented here.

ADD REPLY
0
Entering edit mode

What is the mean expressed transcript length? What is the meaning of "dependence" in this context? Why are FPKMs more "dependent" than TPMs which also take length into account?

ADD REPLY
0
Entering edit mode

This blog post explains the issue nicely, with examples.

ADD REPLY

Login before adding your answer.

Traffic: 2657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6