Question

Plot normalized counts DESeq or TPM?

0

Entering edit mode

6.7 years ago

Michael Gallagher ▴ 20

I have reads from 16 conditions, 3 replicates each. I used RSEM to align, so I have TPMs, but I've imported the counts into DESeq2 with tximport so I can normalize the counts and extract DE genes in specific contrasts from the dataset. I have also used DESeq2 to produce batch-corrected variance-stabilized transformations (vst) of the dataset, which produced some nice h-cluster heatmaps, PCA plots, and did k-means clustering. Now, if I want to produce plots that examine the expression of individual genes or clusters, should I plot the DESeq normalized counts or the TPMs, using gene lists derived from the DESeq results? Is it "okay" to define clusters with the vst data and but then show the TPMs? Not sure what standard practice is. Thanks!

rna-seq R • 7.5k views

ADD COMMENT • link updated 6.7 years ago by Kristoffer Vitting-Seerup ★ 4.2k • written 6.7 years ago by Michael Gallagher ▴ 20

0

Entering edit mode

In an ideal case (any case?), your differential genes from DESeq2 should show same pattern when you plot a box/violin plot with TPMs. Meaning, over expressed genes should show high TPMs and vice versa. But I would once check the clustering heatmap produced by VST counts and TPMs. If the clusters are same, I wouldn't worry too much.

ADD REPLY • link 6.7 years ago by venu 7.1k

0

Entering edit mode

Ok. Even if the results are similar, does defining clusters with VST information but then presenting TPM information per gene raise any eyebrows? In a publication, would people expect normalized counts instead if everything upstream was done with DESeq too?

Edit: I just made some TPM plots and compared them to plots of normalized counts, and they are almost the same. So it looks like just personal preference at this point? The only thing I can think of is that would argue for one over the othe is that the normalized counts would be normalized across all my samples, but TPMs are only normalized within sample, correct?

ADD REPLY • link 6.7 years ago by Michael Gallagher ▴ 20

1

Entering edit mode

I would guess that 99% of the reviews would not even notice as long as the results make biological sense. Still, I would keep things simple and show the counts that the results are based on, so vst or rlog, depending on what you used. On the y-axis, just label it as normalized counts. That is at least what I mostly do.

ADD REPLY • link 6.7 years ago by ATpoint 88k

score 0 · Answer 1 · 2018-11-13

0

Entering edit mode

6.7 years ago

Kristoffer Vitting-Seerup ★ 4.2k

Due to the batch effect you should either:

Use the vst batch corrected data
Use the log2FC produced by DESeq2 (which also will be batch corrected).

ADD COMMENT • link 6.7 years ago by Kristoffer Vitting-Seerup ★ 4.2k