Question: Plot normalized counts DESeq or TPM?
gravatar for Michael Gallagher
22 months ago by
Michael Gallagher0 wrote:

I have reads from 16 conditions, 3 replicates each. I used RSEM to align, so I have TPMs, but I've imported the counts into DESeq2 with tximport so I can normalize the counts and extract DE genes in specific contrasts from the dataset. I have also used DESeq2 to produce batch-corrected variance-stabilized transformations (vst) of the dataset, which produced some nice h-cluster heatmaps, PCA plots, and did k-means clustering. Now, if I want to produce plots that examine the expression of individual genes or clusters, should I plot the DESeq normalized counts or the TPMs, using gene lists derived from the DESeq results? Is it "okay" to define clusters with the vst data and but then show the TPMs? Not sure what standard practice is. Thanks!

rna-seq R • 2.7k views
ADD COMMENTlink modified 22 months ago by kristoffer.vittingseerup3.4k • written 22 months ago by Michael Gallagher0

In an ideal case (any case?), your differential genes from DESeq2 should show same pattern when you plot a box/violin plot with TPMs. Meaning, over expressed genes should show high TPMs and vice versa. But I would once check the clustering heatmap produced by VST counts and TPMs. If the clusters are same, I wouldn't worry too much.

ADD REPLYlink written 22 months ago by venu6.7k

Ok. Even if the results are similar, does defining clusters with VST information but then presenting TPM information per gene raise any eyebrows? In a publication, would people expect normalized counts instead if everything upstream was done with DESeq too?

Edit: I just made some TPM plots and compared them to plots of normalized counts, and they are almost the same. So it looks like just personal preference at this point? The only thing I can think of is that would argue for one over the othe is that the normalized counts would be normalized across all my samples, but TPMs are only normalized within sample, correct?

ADD REPLYlink modified 22 months ago • written 22 months ago by Michael Gallagher0

I would guess that 99% of the reviews would not even notice as long as the results make biological sense. Still, I would keep things simple and show the counts that the results are based on, so vst or rlog, depending on what you used. On the y-axis, just label it as normalized counts. That is at least what I mostly do.

ADD REPLYlink modified 22 months ago • written 22 months ago by ATpoint39k
gravatar for kristoffer.vittingseerup
22 months ago by
European Union
kristoffer.vittingseerup3.4k wrote:

Due to the batch effect you should either:

  1. Use the vst batch corrected data
  2. Use the log2FC produced by DESeq2 (which also will be batch corrected).
ADD COMMENTlink written 22 months ago by kristoffer.vittingseerup3.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1124 users visited in the last hour