Question: pheatmap and aheatmap give different results when using pearson correlation as distance
1
gravatar for dimitarrosenovkolev
5 weeks ago by
dimitarrosenovkolev20 wrote:

I asked this question at StackOverflow but it seems no one can answer.

As far as I can see the two functions differ only when using Pearson's correlation as a distance. I do not know which one is correct.

I am trying to make pheatmap cluster columns in the same order as aheatmap.

I have looked at both functions, created a small example set, used the same clustering functions, yet they both give different answers.

set.seed( 1234 )
testm <- replicate(10, rnorm(20))

pt <- pheatmap( testm, clustering_distance_rows = "correlation", clustering_distance_cols = "correlation" )
at <- aheatmap( testm, Colv = "correlation", Rowv = "correlation", hclustfun = "complete" )

When looking at

pt$tree_col$order vs at$colInd

we see that they produce different cluster ordering. What is the difference in the functions and how do I make pheatmap give the same clustering output as aheatmap?

We can observe the different order by simple visual inspection of the heatmaps.

This is an example for the order of the columns:

hclust is always "complete".

When they both use Pearson's correlation as distance:

aheatmap: 9  8 10  3  2  7  4  6  1  5
pheatmap:  4  6  9  1  5  3  2  7  8 10

When I use Euclidean distance they both give: 9 4 6 1 5 8 10 3 2 7

For maximum distance they both give: 10 7 2 6 9 4 1 5 3 8

heatmap aheatmap pheatmap R • 170 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by dimitarrosenovkolev20
2

No offense, but taking into account that the author of aheatmap function made 2 typos in 1 installation line (intall.pacakges('NMF'), http://renozao.github.io/NMF/master/vignettes/aheatmaps.pdf ) - I would rather go with pheatmap

ADD REPLYlink modified 5 weeks ago by ATpoint26k • written 5 weeks ago by German.M.Demidov960

Or go with ComplexHeatmap which I found the most comprehensive package, even though you'll need some time to get your head around the principles as it is very heavy-loaded due to its plethora of functionalities. Still, a good investment I think.

ADD REPLYlink written 5 weeks ago by ATpoint26k

I've seen someone ask a similar question; why do these two packages produce slightly different results and how can I make them agree. There's a lot of discussion regarding pheatmap vs heatmap2.

The question is why do you want to make them agree?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Amar620
1

When correlation is selected, they both calculate the distance matrix in the same way:

pheatmap's default linkage method is 'complete', so, no difference there, either.

The difference likely lies in how the columns are re-ordered. Take a look at reorderfun.

I have somewhat the same sentiment as Amar, though: why do you want them to agree?

ADD REPLYlink written 5 weeks ago by Kevin Blighe52k

If I change pearson's correlation to euclidean distance then they agree. So, the question is, which one implements pearson's correlation as distance the correct way. I doubt reorderfun would be different for different distance measures. I want to use the one that gives the correct answer when using pearson's correlation as distance.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by dimitarrosenovkolev20

I am reasonably sure they both correctly apply the parameters you give them, but you would extensively need to review the code to make sure all parameters are indeed identical. Please take no offense in the following sentence but I always find it odd that users make claims like the result is not correct simply based on the output not fitting their straight-forward expectations. There is not one correct output given the many factors that can influence a heatmap. There might be some details on how columns are grouped (as Kevin already pointed put). Please make sure you evaluate all of this before making claims that something is not correct. Again, please take no offense, the above sentences are not specifically pointed at you but rather to all users who aim to sort out unexpected differences between tools.

What exactly is different? Are the major clusters the same or is it simply the order of the clusters itself in the visualzation?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by ATpoint26k

This is an example for the order of the columns:

hclust is always "complete".

When they both use Pearson's correlation as distance:

aheatmap: 9  8 10  3  2  7  4  6  1  5
pheatmap:  4  6  9  1  5  3  2  7  8 10

When I use Euclidean distance they both give: 9 4 6 1 5 8 10 3 2 7

For maximum distance they both give: 10 7 2 6 9 4 1 5 3 8

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by dimitarrosenovkolev20

This question is actually good. I also played with the data a bit and I am also lost why it gives different results - the code should lead to the same clustering for sure, but it is not the same.

ADD REPLYlink written 5 weeks ago by German.M.Demidov960
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 930 users visited in the last hour