pheatmap and aheatmap give different results when using pearson correlation as distance
0
1
Entering edit mode
4.5 years ago

I asked this question at StackOverflow but it seems no one can answer.

As far as I can see the two functions differ only when using Pearson's correlation as a distance. I do not know which one is correct.

I am trying to make pheatmap cluster columns in the same order as aheatmap.

I have looked at both functions, created a small example set, used the same clustering functions, yet they both give different answers.

set.seed( 1234 )
testm <- replicate(10, rnorm(20))

pt <- pheatmap( testm, clustering_distance_rows = "correlation", clustering_distance_cols = "correlation" )
at <- aheatmap( testm, Colv = "correlation", Rowv = "correlation", hclustfun = "complete" )

When looking at

pt$tree_col$order vs at$colInd

we see that they produce different cluster ordering. What is the difference in the functions and how do I make pheatmap give the same clustering output as aheatmap?

We can observe the different order by simple visual inspection of the heatmaps.

This is an example for the order of the columns:

hclust is always "complete".

When they both use Pearson's correlation as distance:

aheatmap: 9  8 10  3  2  7  4  6  1  5
pheatmap:  4  6  9  1  5  3  2  7  8 10

When I use Euclidean distance they both give: 9 4 6 1 5 8 10 3 2 7

For maximum distance they both give: 10 7 2 6 9 4 1 5 3 8

R heatmap pheatmap aheatmap • 3.1k views
ADD COMMENT
2
Entering edit mode

No offense, but taking into account that the author of aheatmap function made 2 typos in 1 installation line (intall.pacakges('NMF'), http://renozao.github.io/NMF/master/vignettes/aheatmaps.pdf ) - I would rather go with pheatmap

ADD REPLY
0
Entering edit mode

Or go with ComplexHeatmap which I found the most comprehensive package, even though you'll need some time to get your head around the principles as it is very heavy-loaded due to its plethora of functionalities. Still, a good investment I think.

ADD REPLY
0
Entering edit mode

I've seen someone ask a similar question; why do these two packages produce slightly different results and how can I make them agree. There's a lot of discussion regarding pheatmap vs heatmap2.

The question is why do you want to make them agree?

ADD REPLY
1
Entering edit mode

When correlation is selected, they both calculate the distance matrix in the same way:

pheatmap's default linkage method is 'complete', so, no difference there, either.

The difference likely lies in how the columns are re-ordered. Take a look at reorderfun.

I have somewhat the same sentiment as Amar, though: why do you want them to agree?

ADD REPLY
0
Entering edit mode

If I change pearson's correlation to euclidean distance then they agree. So, the question is, which one implements pearson's correlation as distance the correct way. I doubt reorderfun would be different for different distance measures. I want to use the one that gives the correct answer when using pearson's correlation as distance.

ADD REPLY
0
Entering edit mode

I am reasonably sure they both correctly apply the parameters you give them, but you would extensively need to review the code to make sure all parameters are indeed identical. Please take no offense in the following sentence but I always find it odd that users make claims like the result is not correct simply based on the output not fitting their straight-forward expectations. There is not one correct output given the many factors that can influence a heatmap. There might be some details on how columns are grouped (as Kevin already pointed put). Please make sure you evaluate all of this before making claims that something is not correct. Again, please take no offense, the above sentences are not specifically pointed at you but rather to all users who aim to sort out unexpected differences between tools.

What exactly is different? Are the major clusters the same or is it simply the order of the clusters itself in the visualzation?

ADD REPLY
0
Entering edit mode

This is an example for the order of the columns:

hclust is always "complete".

When they both use Pearson's correlation as distance:

aheatmap: 9  8 10  3  2  7  4  6  1  5
pheatmap:  4  6  9  1  5  3  2  7  8 10

When I use Euclidean distance they both give: 9 4 6 1 5 8 10 3 2 7

For maximum distance they both give: 10 7 2 6 9 4 1 5 3 8

ADD REPLY
0
Entering edit mode

This question is actually good. I also played with the data a bit and I am also lost why it gives different results - the code should lead to the same clustering for sure, but it is not the same.

ADD REPLY

Login before adding your answer.

Traffic: 2025 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6