Why k value for k-means clustering is low for large dataset?

0

Entering edit mode

5.4 years ago

anikng • 0

I tried a few methods on determining best k value for k-means clustering. When I applied 2 of these methods for my RNASeq datsest, I am getting k=2 or k=3. Considering the the matrix of 40000 genes with 90 samples, I was expecting higher number of k. I check it with multiple k.max value, but it remained same.Could you please give suggestion?

library(factoextra)
library(NbClust)
Normalized_counts_cpm<-read.csv(file = "...Normalized_counts_cpm.csv", header = TRUE, sep=",", row.names = 1)

With Elbow method,

 fviz_nbclust(Normalized_counts_cpm, kmeans, method = "wss") +
 geom_vline(xintercept = 4, linetype = 2)+
 labs(subtitle = "Elbow method")

With Silhouette method,

fviz_nbclust(Normalized_counts_cpm, kmeans, method = "silhouette",k.max = 30)

RNA-Seq K-means clustering • 1.0k views

ADD COMMENT • link 5.4 years ago by anikng • 0

0

Entering edit mode

Do you a priori expect more?

ADD REPLY • link 5.4 years ago by Devon Ryan 105k

0

Entering edit mode

With MeV k means clustering analysis, k=10 gave me reasonable clusters.

ADD REPLY • link 5.4 years ago by anikng • 0

Login before adding your answer.