Question: Why k value for k-means clustering is low for large dataset?
0
gravatar for anikng
7 weeks ago by
anikng0
anikng0 wrote:

I tried a few methods on determining best k value for k-means clustering. When I applied 2 of these methods for my RNASeq datsest, I am getting k=2 or k=3. Considering the the matrix of 40000 genes with 90 samples, I was expecting higher number of k. I check it with multiple k.max value, but it remained same.Could you please give suggestion?

library(factoextra)
library(NbClust)
Normalized_counts_cpm<-read.csv(file = "...Normalized_counts_cpm.csv", header = TRUE, sep=",", row.names = 1)

With Elbow method,

 fviz_nbclust(Normalized_counts_cpm, kmeans, method = "wss") +
 geom_vline(xintercept = 4, linetype = 2)+
 labs(subtitle = "Elbow method")

With Silhouette method,

fviz_nbclust(Normalized_counts_cpm, kmeans, method = "silhouette",k.max = 30)
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by anikng0

Do you a priori expect more?

ADD REPLYlink written 7 weeks ago by Devon Ryan94k

With MeV k means clustering analysis, k=10 gave me reasonable clusters.

ADD REPLYlink written 7 weeks ago by anikng0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1624 users visited in the last hour