Question: TPM or FPKM as input values for PCA and WGCNA?
1
gravatar for Marion Neely
2.8 years ago by
Marion Neely10
Marion Neely10 wrote:

Would you suggest using TPM or FPKM values for PCA and WGCNA? 

Thanks,

Marion

 

wgcna rna-seq pca fpkm tpm • 2.4k views
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Marion Neely10

I don't know for WGCNA, but a PCA assumes normality so you'll have to (at least) take the log transformed values wether you choose TPM or FPKM.

ADD REPLYlink written 2.8 years ago by Carlo Yague4.3k

"PCA assumes normality" Do you have a reference for this? I don't think PCA needs any assumption. If you have variables measured on different scales, like metres and kilograms, than it's advisable to scale and centre to remove dependency on the units of measure but this is not the case for gene expression.

ADD REPLYlink written 2.8 years ago by dariober9.7k

Ok, you are right, this is not really an assumption. More of an advice to get meaningful results : gene expression has a heavily skewed distribution and PCA is quite sensitive to outliers, that is why I usually log transform expression data. For reference : http://www.bioconductor.org/help/workflows/rnaseqGene/#the-rlog-transformation

ADD REPLYlink written 2.8 years ago by Carlo Yague4.3k
2
gravatar for Rob
2.8 years ago by
Rob2.8k
United States
Rob2.8k wrote:

Hi Marion,

  For this purpose, I'd imagine you would not likely see much difference.  However, there is literally no reason to prefer FPKM over TPM.  If you're looking to perform some analysis where relative abundance is an appropriate measure, you should always favor TPM.

ADD COMMENTlink written 2.8 years ago by Rob2.8k
0
gravatar for Marion Neely
2.8 years ago by
Marion Neely10
Marion Neely10 wrote:

Thank you everyone for your help! I tried it both ways. The PCA from the FPKM values made the most sense and the plot was similar to previous work. When I used the TPM values the strong separation by PC1 that we had seen with FPKM and in previous analysis moved to PC2. 

ADD COMMENTlink written 2.8 years ago by Marion Neely10

That's interesting (i.e. the shift).  However, the reason to prefer TPM over FPKM is that FPKM has a (somewhat arbitrary) dependence on the mean expressed transcript length of a samples, while TPM does not.  It's probably worth checking that the separation you see in PC component was is not an artifact of this technical detail.  You can calculate the different scaling factors between your samples using a method such as presented here.

ADD REPLYlink written 2.8 years ago by Rob2.8k

What is the mean expressed transcript length? What is the meaning of "dependence" in this context? Why are FPKMs more "dependent" than TPMs which also take length into account?

ADD REPLYlink written 11 months ago by holgerbrandl30

This blog post explains the issue nicely, with examples.

ADD REPLYlink written 11 months ago by Rob2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1757 users visited in the last hour