Question: Why k value for k-means clustering is low for large dataset?
gravatar for anikng
7 weeks ago by
anikng0 wrote:

I tried a few methods on determining best k value for k-means clustering. When I applied 2 of these methods for my RNASeq datsest, I am getting k=2 or k=3. Considering the the matrix of 40000 genes with 90 samples, I was expecting higher number of k. I check it with multiple k.max value, but it remained same.Could you please give suggestion?

Normalized_counts_cpm<-read.csv(file = "...Normalized_counts_cpm.csv", header = TRUE, sep=",", row.names = 1)

With Elbow method,

 fviz_nbclust(Normalized_counts_cpm, kmeans, method = "wss") +
 geom_vline(xintercept = 4, linetype = 2)+
 labs(subtitle = "Elbow method")

With Silhouette method,

fviz_nbclust(Normalized_counts_cpm, kmeans, method = "silhouette",k.max = 30)
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by anikng0

Do you a priori expect more?

ADD REPLYlink written 7 weeks ago by Devon Ryan94k

With MeV k means clustering analysis, k=10 gave me reasonable clusters.

ADD REPLYlink written 7 weeks ago by anikng0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1624 users visited in the last hour