Entering edit mode
2.1 years ago
Hyper_Odin
▴
320
Hello all,
I have 2 scripts for visualizing the TPM values.
- In one script the variance of each gene is computed across the samples.
- In another, the counts are log2 transformed, followed by a heatmap.
Which is the correct way of visualizing TPM values.
1.
V <- apply(countdata, 1, var)
selectedGenes <- names(V[order(V, decreasing = T)][1:1000])
pheatmap(newdata[selectedGenes,], scale = 'row',
show_rownames = T, clustering_distance_rows = "correlation",
cluster_rows = TRUE, cluster_cols = TRUE, fontsize_row = 5,
fontsize = 5,
)
2.
logtransformed <- log2(countdata + 1 )
my_palette <- colorRampPalette(c("green", "black", "red"))(n = 1000)
z.mat = t(scale(t(logtransformed), scale = T, center = T))
heatmap.2(z.mat, dendrogram="both", scale="none", trace="none", col = my_palette, cexCol =0.5
Well, "correct" depends on what you want to show, but variance of non log-transformed values is pointless as in this case variance simply rises with magnitude. The second script makes sense if you want to show relative differences.
Thank you. In fact, I was thinking the same!
Just be aware that TPM is not comparable across samples. You're better off quantile normalizing raw counts (or using DESeq2/edgeR builtin normalization methods) and then log transforming these normalized values. See: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7373998/
Kevin Blighe frequently links to a blog posts that has a clearer explanation. I'll see if I can find it.
EDIT: I was mistaken on the content of Kevin's comments. Here is one of his comments on the topic: Gene expression units explained: RPM, RPKM, FPKM and TPM