Hi Guys,

Can anyone suggest me an efficient way to cluster patients based on the variants called ? I have performed Hierarchical clustering by converting data into matrix of 1s and 0s. The problem which I am facing is that the HC method clusters the samples with matching 1 and zero closer. But, I just want to focus on matching 1s and not zeros.

In this case, how to avoid counting places in samples with matching zeros ? Or is there any other efficient algorithm to do the same ?

Please suggest. Thanks in advance.

I also agree with "The absence of signal (missing zeros) will also become a part of the clustering pattern", but that got me wondering: Could we give more weights to 1s than 0s? If that's possible, wouldn't there be a situation where we give weight of zero (w=0) to 0s, essentially removing their "clustering pattern"? Of course, this assumes we are able to give differential weights to each class (0 vs. 1), which may be impossible to begin with.

Even if adding different weights were possible, I don't see any justification for it.

If the pattern is dominated by zeros, meaning many columns with zeros and few with ones, I would delete all invariant columns and then cluster. Removing invariant columns is a legitimate way to boost the signal, because there is no signal in columns where all values are identical. This would still retain the true signal without any artifact introduced by weighing.