Spectral Clustering for TCGA/Gene Expression Data
Entering edit mode
2.3 years ago
aaragak1 ▴ 40

Hello all,

I've been using M3C for consensus clustering on TCGA data to get an estimate on how many 'real' clusters of tumors are in these data. I'm wondering if and when spectral clustering should be applied vs others (hc, pam, km).

Thanks for your time!

R RNA-Seq • 482 views
Entering edit mode
2.3 years ago

As usual when it comes to clustering, the answer is 'it depends'. PAM and K-means assume that clusters are more or less spherical, hierarchical clustering makes different assumptions depending on the linkage method used but for Ward's criterion the clusters are also assumed to be spherical, average and complete linkage also tend to work best on spherical clusters. Hierarchical clustering doesn't actually directly produce clusters but a tree that needs to be cut to produce clusters. How you cut the tree also affects the number of clusters. Spectral clustering can be useful for finding clusters in some more complicated situations by essentially trying to find a space in which the clusters are well separated and spherical (if one uses k-means at the clustering step). Then it also depends on how you measure distance/similarity. So unless there is a clear cluster structure, different algorithms will produce different clusters. Also keep in mind that what an algorithm calls a cluster may or may not make sense to you. I would suggest to explore the data first by plotting various representations then trying hierarchical clustering to get a feeling for whether there are distinguishable clusters.


Login before adding your answer.

Traffic: 1259 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6