Hi everyone,
I have roughly 1000 gene-sets ranging from 10 - 100 genes each and I am interested in clustering them based on the similarity of their gene content. I was thinking of using something like Kmodes clustering (similar to Kmeans but for categorical data), but I was wondering if there might be a better method that people are aware of?
Thank you in advance,
Nate
Thanks for your response Mensur. As you said, BLAST is used for biological sequences, which might not make complete sense here since we are clustering gene sets by the similarity of their gene content with other gene sets and not on a specific sequence of genes etc (e.g., cluster gene sets together that share many similar genes)
Perhaps using something like a Tanimoto index to assess the similarity of gene sets and cluster based on that?