Question

Hierarchical clustering in r

0

Entering edit mode

2.0 years ago

shivangi.agarwal800 ▴ 120

Hi Guys,

Can anyone suggest me an efficient way to cluster patients based on the variants called ? I have performed Hierarchical clustering by converting data into matrix of 1s and 0s. The problem which I am facing is that the HC method clusters the samples with matching 1 and zero closer. But, I just want to focus on matching 1s and not zeros.

In this case, how to avoid counting places in samples with matching zeros ? Or is there any other efficient algorithm to do the same ?

Please suggest. Thanks in advance.

cluster • 725 views

ADD COMMENT • link updated 2.0 years ago by Mensur Dlakic ★ 27k • written 2.0 years ago by shivangi.agarwal800 ▴ 120

score 0 · Answer 1 · 2022-04-14

0

Entering edit mode

2.0 years ago

Mensur Dlakic ★ 27k

You can convert your matrix into a sparse format, which essentially deletes zeros and has an empty space for them instead. However, I don't think that will change your clustering solution. What you are talking about as matching only ones but not zeros is not how clustering happens. The absence of signal (missing zeros) will also become a part of the clustering pattern.

ADD COMMENT • link 2.0 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

I also agree with "The absence of signal (missing zeros) will also become a part of the clustering pattern", but that got me wondering: Could we give more weights to 1s than 0s? If that's possible, wouldn't there be a situation where we give weight of zero (w=0) to 0s, essentially removing their "clustering pattern"? Of course, this assumes we are able to give differential weights to each class (0 vs. 1), which may be impossible to begin with.

ADD REPLY • link 2.0 years ago by sbstevenlee ▴ 480

0

Entering edit mode

Even if adding different weights were possible, I don't see any justification for it.

If the pattern is dominated by zeros, meaning many columns with zeros and few with ones, I would delete all invariant columns and then cluster. Removing invariant columns is a legitimate way to boost the signal, because there is no signal in columns where all values are identical. This would still retain the true signal without any artifact introduced by weighing.

ADD REPLY • link 2.0 years ago by Mensur Dlakic ★ 27k