Question

Should I use TPM or TMM to plot gene expression boxplots in RNAseq?

0

Entering edit mode

17 months ago

1215045934 ▴ 80

Hi all!

I used $TRINITY_HOME/util/align_and_estimate_abundance.pl from trinity to do transcript quantification for my RNAseq data. Then I got the following outputs:

I would like to plot the boxplots for several genes. Which one should I use. TMM or TPM?

I understand that TPM is not normalized across samples, but I did see lots of people using it in their papers for boxplots. The author of Trinity aslo said "and the normalized expression values (FPKM or TPM) are used almost everywhere else, such as plotting in heatmaps." (from the link above). I am a bit confused.

Thanks in advance!

RNASeq TMM DEG TPM DESeq2 • 1.6k views

ADD COMMENT • link updated 17 months ago by i.sudbery 20k • written 17 months ago by 1215045934 ▴ 80

0

Entering edit mode

What is the purpose of your box plots? Will the x-axis be different genes, or different conditions?

ADD REPLY • link 17 months ago by i.sudbery 20k

0

Entering edit mode

To show how that particular gene I am interesed in expressed in difference conditions. x-axis will be difference conditions. I was wondering if I should use TPM or TMM for the y axis.

ADD REPLY • link 17 months ago by 1215045934 ▴ 80

0

Entering edit mode

I would probably use rlog/vst transformed counts (from the DESeq2 package, the function that calculates the values performs a normalisation). TMM works best when you have reason to believe that the average difference between conditions is zero, which is not the case here. You would probably get away with using TPM, but it wouldn't be strictly correct.

ADD REPLY • link 17 months ago by i.sudbery 20k

0

Entering edit mode

Thanks a lot! TMM is also normarlized across samples. What is the difference between TMM and results from rlog/vst transformed counts?

ADD REPLY • link 17 months ago by 1215045934 ▴ 80

0

Entering edit mode

In my understanding TMM uses the assumption that the mean log fold change between conditions is zero. That seems an odd choice when you have multiple condition and aren't interested in doing differential expression. Rlog and vst aren't normalisation, but are rather transformations aimed at stabilising the variance, so that more highly expressed genes/ samples do not have a higher variance, which is useful when qualitatively/ informally doing comparisons. But as part of the calculation process, a cross sample normalisation is applied in the form of DESeq2's default normalisation.

ADD REPLY • link 17 months ago by i.sudbery 20k