Best method for clustering similar gene sets?
1
0
Entering edit mode
2.5 years ago
n.y.bell • 0

Hi everyone,

I have roughly 1000 gene-sets ranging from 10 - 100 genes each and I am interested in clustering them based on the similarity of their gene content. I was thinking of using something like Kmodes clustering (similar to Kmeans but for categorical data), but I was wondering if there might be a better method that people are aware of?

Thank you in advance,

Nate

sets Clustering Gene • 751 views
ADD COMMENT
0
Entering edit mode
2.5 years ago
Mensur Dlakic ★ 27k

A better method is a relative term. Almost certainly there are better methods than what you are trying to do, but you may not be inclined to do them either because of time or other resources.

A relatively easy way to cluster biological sequences is to do a BLAST search in all-vs-all fashion, and then use graph clustering based on E-values you got from that search. The details of the method are described in detail here, and the protocol itself is here. The original method has been published here, and there have been many improvements described in papers that cite the original method.

ADD COMMENT
0
Entering edit mode

Thanks for your response Mensur. As you said, BLAST is used for biological sequences, which might not make complete sense here since we are clustering gene sets by the similarity of their gene content with other gene sets and not on a specific sequence of genes etc (e.g., cluster gene sets together that share many similar genes)

ADD REPLY
0
Entering edit mode

Perhaps using something like a Tanimoto index to assess the similarity of gene sets and cluster based on that?

ADD REPLY

Login before adding your answer.

Traffic: 2653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6