Hierarchical clustering in r
1
0
Entering edit mode
2.1 years ago

Hi Guys,

Can anyone suggest me an efficient way to cluster patients based on the variants called ? I have performed Hierarchical clustering by converting data into matrix of 1s and 0s. The problem which I am facing is that the HC method clusters the samples with matching 1 and zero closer. But, I just want to focus on matching 1s and not zeros.

In this case, how to avoid counting places in samples with matching zeros ? Or is there any other efficient algorithm to do the same ?

cluster • 761 views
0
Entering edit mode
2.1 years ago
Mensur Dlakic ★ 27k

You can convert your matrix into a sparse format, which essentially deletes zeros and has an empty space for them instead. However, I don't think that will change your clustering solution. What you are talking about as matching only ones but not zeros is not how clustering happens. The absence of signal (missing zeros) will also become a part of the clustering pattern.

0
Entering edit mode

I also agree with "The absence of signal (missing zeros) will also become a part of the clustering pattern", but that got me wondering: Could we give more weights to 1s than 0s? If that's possible, wouldn't there be a situation where we give weight of zero (w=0) to 0s, essentially removing their "clustering pattern"? Of course, this assumes we are able to give differential weights to each class (0 vs. 1), which may be impossible to begin with.

0
Entering edit mode

Even if adding different weights were possible, I don't see any justification for it.

If the pattern is dominated by zeros, meaning many columns with zeros and few with ones, I would delete all invariant columns and then cluster. Removing invariant columns is a legitimate way to boost the signal, because there is no signal in columns where all values are identical. This would still retain the true signal without any artifact introduced by weighing.