Question

Best method for clustering similar gene sets?

0

Entering edit mode

2.5 years ago

n.y.bell • 0

Hi everyone,

I have roughly 1000 gene-sets ranging from 10 - 100 genes each and I am interested in clustering them based on the similarity of their gene content. I was thinking of using something like Kmodes clustering (similar to Kmeans but for categorical data), but I was wondering if there might be a better method that people are aware of?

Thank you in advance,

Nate

sets Clustering Gene • 757 views

ADD COMMENT • link 2.5 years ago by n.y.bell • 0

score 0 · Answer 1 · 2021-10-20

0

Entering edit mode

2.5 years ago

Mensur Dlakic ★ 27k

A better method is a relative term. Almost certainly there are better methods than what you are trying to do, but you may not be inclined to do them either because of time or other resources.

A relatively easy way to cluster biological sequences is to do a BLAST search in all-vs-all fashion, and then use graph clustering based on E-values you got from that search. The details of the method are described in detail here, and the protocol itself is here. The original method has been published here, and there have been many improvements described in papers that cite the original method.

ADD COMMENT • link 2.5 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Thanks for your response Mensur. As you said, BLAST is used for biological sequences, which might not make complete sense here since we are clustering gene sets by the similarity of their gene content with other gene sets and not on a specific sequence of genes etc (e.g., cluster gene sets together that share many similar genes)

ADD REPLY • link 2.5 years ago by n.y.bell • 0

0

Entering edit mode

Perhaps using something like a Tanimoto index to assess the similarity of gene sets and cluster based on that?

ADD REPLY • link 2.5 years ago by n.y.bell • 0