Hello All,
I am confused about RA-seq normalization methods and when is appropriate to use them. I appreciate if you could share your thought with me.
My understanding from TMM and TPM is that TMM is appropriate for between sample/condition comparison as it counts for RNA composition in addition to library size (e.g. it is used by edgeR for DE analysis). While a method like TPM is better for within sample comparison. But many online tools use TPM to illustrate a gene expression levels across different tissue types (like GTEx data portal). Isn't this wrong?
The reason that I got into this is that I was performing DE analysis using edgeR the other day to compare samples from a tumor to some normal tissues. Although edgeR reports a downregulation for some genes, an illustration of those gene's expression using TPM values shows kinda an upregulation effect. When I extracted and plotted the pseudo counts from edgeR (TMM normalized counts), I clearly can see the downregulation - but TPM doesn't agree! So I am confused now. Is it wrong to use TPM normalized counts for plotting gene expression?
I appreciate your time and thoughts on this. Thanks,
Thank you very much for your comment. It was very clear. Can I ask your opinion about methods that use TPM values for their calculation? I have seen some papers that take TPM or even FPKM values to perform survival or correlation analysis. Would you consider the result of these analyses wrong?
Anything is better than FPKM. That TPMs are not robustly normalized should be well known by those using them. It's rarely going to be the case that this will cause such a large effect as what you ran into, so most of the results will be correct, but I'll reiterate that one should be aware of the limitations of using TPMs before using them. They have their utility and I have certainly made good use of them (e.g., filtering for highly expressed transcripts, which works vastly better with TPMs than normalized counts), but they're not a panacea.