Pheatmap/Complexheatmap: making a continuous color scale with NAs
Entering edit mode
5 weeks ago
Ridha ▴ 20

Hello everyone, hope you are doing well:)

I have a question regarding making a heatmap to visualize significance AND presense of pathways across different conditions. I already did pathway enrichment analysis using gProfiler. I imported the results and manipulated them to make the names of the pathways as rownames, adjusted p values in rows and colnames for the conditions like so:

kegg_react_wp[1:3,] # showing only first 3 pathways

                          Condition1    Condition2        Condition3     Condition4     Condition5   Condition6
                  Pathway1       0.0003            0.3225              0.0540            0.0003      0.01         0.07
                  Pathway2       0.03             0.003225              0.0540            0.0003          0.01         0.07
                  Pathway3       0.703            0.3225              0.0540            0.0003            0.01         0.07

Because some pathways have an adjusted p value >0.05, I want to make these pathways as NA , which indicates that this pathway was NOT enriched in this condition. So I used the following code to do that as well as to -log10 the pathways with adjusted p value<0.05 so that most significant pathways have larger values:

 Mooi_functie= function(x){
  x = ifelse(x>0.05,NA,-log10(x))

Now apply the function to my data

mutate_if(is.numeric,Mooi_functie) # this will make cells NA if their adjusted p value larger than 0.05

Now making the heatmap(using pheatmap or complexheatmap):

 pheatmap(pathways_clean) # pheatmap package
 Heatmap(pathways_clean)# complexheatmap package

I get the following error

 Error in hclust(d, method = method) : 
  NA/NaN/Inf in foreign function call (arg 10) # from pheatmap
 Error in hclust(get_dist(submat, distance), method = method) : 
 NA/NaN/Inf in foreign function call (arg 10) # from complexheatmap

A similar question has been posted here about the same error, but the purpose for that question, was different than mine. I want to KEEP NAs(unlike the already posted question) so that I can color them differently to indicate that these pathways are NOT present in this condition. it seems that this error has to do with the clustering of the rows, so if I set cluster_rows=F, it works, but I want to cluster the rows and scale by row to see in which condition the significance is larger. I understand that there are some rows which are basiaclly all NAs except for one or two cells and this seems problematic for making the heatmaps.

I found in the internet a nice trick to add a column with any value , and it worked


However, now It worked, but I am left with a fake column. it's not nice to have this column in the heatmap as you can see below.

the code I used to generate the heatmap:

  pheatmap(pathways_clean,cluster_rows = T,na_col = "white",border_color = "white",
         annotation_row =meta_kegg_wp_reac,cellwidth = 35,fontsize = 8,angle_col = 45,
         scale = "row")

enter image description here

My questions are:

1) How to solve the issue of clustering with NAs without making a fake column? Or perhaps there is another way to visualize this without using NAs?(I tried replacing the NAs with zeros, but then the color scaling gets messy and cannot tell heads from tails).

2) Does it make sense to scale( scale="row") the -log10 of adjusted p values? Because I find it difficult to make sense of z-scores of -log10 of adjusted p values, which is what the legend in the heatmap represents.

3) Do you think that there might be a better way to visualize these results in my case?

Thank you very much in advance for your help!

clustering pathways complexheatmap pheatmap • 351 views
Entering edit mode
5 weeks ago
Zuguang Gu ▴ 200

You can first generate a dendrogram with the complete matrix and draw the heatmap with the new matrix where pvalues higher than 0.05 are replaced by NA:

m = -log10(p)
row_dend = hclust(dist(p))
col_dend = hclust(dist(t(p)))

m2 = m
m2[p > 0.05] = NA

Heatmap(m2, row_dend = row_dend, column_dend = col_dend, ...)

If you use a three-color palette, e.g. green-white-red, I don't think you need to replace high p-values to NA, while keep them all and let green maps to in-significant p-values and red maps to significant p-values.

# here all the p-values < 0.0001 are all mapped to red
Heatmap(-log10(p), col = colorRamp2(c(0, 2, 4), c("green", "white", "red")), ...,
    heatmap_legend_param = list(title = "p-value", at = c(0, 2, 4), labels = c(1, 0.01, 0.0001))

I think scaling -log10(p-values) is basically not a good idea, it will make the values meaningless...

Entering edit mode

Was able to reach my goal using your approach.Great suggestions! Thank you very much


Login before adding your answer.

Traffic: 1705 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6