Working with Mean TPM Values for Heatmaps

0

Entering edit mode

3.9 years ago

nk130 • 0

I am new to examining RNA-seq data sets and generating heatmaps. Currently, I am working with the Mean TPM dataset from all cell types published to https://dice-database.org.

I've looked at methods for handling zero values, and I currently do the following: - Find the smallest, non zero, TPM in the entire data set. - set a small number close to zero, but smaller than the smallest non-zero mean TPM - replace zero values with this very small number. - take the log(mean TPM), and do clustering with this transformed data set.

What I am wondering though, is if instead I should set an expression threshold based on a few housekeeping genes for my cell types of interest? I ask because there are subsets of my genes of interest where across all of my subtypes, there is background or no expression. While useful to understand where things are, it takes up space in these heatmaps! There other consideration is how to interpret some of these smaller values as real expression, or not, to get a better understanding of dynamic range.

RNA-Seq TPM Heatmap DICE • 1.7k views

ADD COMMENT • link 3.9 years ago by nk130 • 0

0

Entering edit mode

I've looked at methods for handling zero values

Why not keep zeros as zeros? You can make heatmaps with zeros.

ADD REPLY • link 3.9 years ago by igor 13k

0

Entering edit mode

See an edit- I take the log(TPM) to make the heatmaps.

ADD REPLY • link 3.9 years ago by nk130 • 0

0

Entering edit mode

If the problem is log(0), then add a small number. Some people do log(x+0.001), some do as high as log(x+1) to avoid having negative values. For the purpose of a heatmap, you probably won't even notice the difference.

ADD REPLY • link 3.9 years ago by igor 13k

Login before adding your answer.