Question

TMM or TPM normalized counts for visualization?

6

Entering edit mode

5.9 years ago

sam.vanc70 ▴ 50

Hello All,

I am confused about RA-seq normalization methods and when is appropriate to use them. I appreciate if you could share your thought with me.

My understanding from TMM and TPM is that TMM is appropriate for between sample/condition comparison as it counts for RNA composition in addition to library size (e.g. it is used by edgeR for DE analysis). While a method like TPM is better for within sample comparison. But many online tools use TPM to illustrate a gene expression levels across different tissue types (like GTEx data portal). Isn't this wrong?

The reason that I got into this is that I was performing DE analysis using edgeR the other day to compare samples from a tumor to some normal tissues. Although edgeR reports a downregulation for some genes, an illustration of those gene's expression using TPM values shows kinda an upregulation effect. When I extracted and plotted the pseudo counts from edgeR (TMM normalized counts), I clearly can see the downregulation - but TPM doesn't agree! So I am confused now. Is it wrong to use TPM normalized counts for plotting gene expression?

I appreciate your time and thoughts on this. Thanks,

TPM TMM edgeR normalization • 17k views

ADD COMMENT • link updated 23 months ago by Mahmoud • 0 • written 5.9 years ago by sam.vanc70 ▴ 50

0

Entering edit mode

4.5 years ago

simplitia ▴ 130

To add to this, why not both. You can also get tpmTMM normalized- the benefit of this is to stabilize rna composition.

ADD COMMENT • link 4.5 years ago by simplitia ▴ 130

1

Entering edit mode

The TMM factor is in fact used to correct the naive per-million normalisation for library composition, see e.g. the source code of cpm in edgeR. Still, to estimate it you need raw counts so if you only have TPM, you will not get a meaningful TMM estimate.

ADD REPLY • link 4.5 years ago by ATpoint 81k

score 9 · Accepted Answer · 2018-05-31

9

Entering edit mode

5.9 years ago

Devon Ryan 104k

Online tools use TPMs for illustration because they can calculate them once and be done. This is convenient when people want to add samples over time and change groups and samples being visually compared. The results of that will not be as robust to outliers as normalized counts (produced with TMM or another method), but they're usually good enough for visualizations.

What you observed is due to the non-robustness of TPM that I mentioned earlier. TPM is one of those things that has its use, but if you're in a scenario where you can use properly normalized counts then that's usually preferable. As an aside, you could convert your normalized counts (or pseudo-counts) to TPMs and then you'd see the down-regulation.

ADD COMMENT • link 5.9 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you very much for your comment. It was very clear. Can I ask your opinion about methods that use TPM values for their calculation? I have seen some papers that take TPM or even FPKM values to perform survival or correlation analysis. Would you consider the result of these analyses wrong?

ADD REPLY • link 5.9 years ago by sam.vanc70 ▴ 50

1

Entering edit mode

Anything is better than FPKM. That TPMs are not robustly normalized should be well known by those using them. It's rarely going to be the case that this will cause such a large effect as what you ran into, so most of the results will be correct, but I'll reiterate that one should be aware of the limitations of using TPMs before using them. They have their utility and I have certainly made good use of them (e.g., filtering for highly expressed transcripts, which works vastly better with TPMs than normalized counts), but they're not a panacea.

ADD REPLY • link 5.9 years ago by Devon Ryan 104k

0

Entering edit mode

Hello,

I was wondering if is wrong to use TPM for differential gene expression analysis from RNA sequence data. The samples that I want to compare is technically the same cell line but treated one is untreated and the other is treated with a drug. I have read that TMM should be used if you are comparing different samples or different tissue or different cell line. However, in my case even though I am comparing different samples the cell line is the same the difference is that one group is treated with a drug while the other is not.

The reason why I am asking the question is because the group bioinformatics (who doesn't really have a biology background) insists that we have to use TMM because we are comparing different samples. However, I think TPM is fine here because technically it is the same cell line but treated different. I know that TMM is used when comparing samples from different origins or different cell lines.

Lastly, looking at the data normalized via both methods, the TPM data make more sense and correlates with actual biological validations that I have done in the lab and the literature. Any input is greatly appreciated

Thanks,

ADD REPLY • link 23 months ago by Mahmoud • 0