Scaling for p.heatmap
1
0
Entering edit mode
22 months ago
bnayer26 • 0

I'm new to R and am making a heatmap for some RNA sequencing data using p.heatmap. My input data is the Log2CPM of genes across 5 samples (samples in columns, genes in rows). I want to understand whether I should scale my data or not, using the scale() function. And secondly, if I should set scale="row" in the p.heatmap function or not. Here is my code:

heatmap_trial_2 <- read.csv("Final genes_log2CPM.csv")
heatmap_trial_2 <- data.frame(heatmap_trial_2[,-1], row.names=heatmap_trial_2[,1])
sc_1 <-t(scale(t(heatmap_trial_2), center = TRUE, scale = TRUE))
pheatmap(sc_1, kmeans_k = NA, breaks = NA, scale = "none", cluster_rows = FALSE,
         cluster_cols = FALSE,
         show_rownames = TRUE, show_colnames = TRUE,
         colorRampPalette(brewer.pal(9,"BuPu"))(100))

I noticed that if I set scale = "row" in the p.heatmap code, then the heatmap looks exactly the same regardless of whether i set scale = TRUE or scale = FALSE using the scale function. But if I set scale = TRUE using the scale function and then set the scale = "none" in the p.heatmap code (which is the code given above), then the plot is different. I am struggling to determine which of these is the correct way to do it for my data. At what step should I perform the "scaling"? Any help would be highly appreciated, thanks!

variance p.heatmap scaling unit • 909 views
ADD COMMENT
0
Entering edit mode

In pheatmap, scale has only 3 input paratmeters. Copy/pasted from manual:

scale
character indicating if the values should be centered and scaled in either the row direction or the column direction, or none. Corresponding values are "row", "column" and "none"

If you set scale = T, it's always row wise scaling.

ADD REPLY
0
Entering edit mode
22 months ago
ATpoint 82k

Scaling in the heatmap context usually means that one standardizes the expression data (usually the normalized counts on the log scale) to give them a mean of zero and a standard deviation of one. This is what you do in sc_1. This is good because it allows to compare genes with different expression levels. Here are more details: Scaling RNA-Seq data before clustering?

That means if you scale externally then you don't have to scale inside the heatmap function. I am not a pheatmap user but scale ="none" appears reasonable to me.

ADD COMMENT
0
Entering edit mode

Thank you!! :) That's what I was also thinking, thank you for your answer!

ADD REPLY

Login before adding your answer.

Traffic: 3161 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6