Why k value for k-means clustering is low for large dataset?
0
0
Entering edit mode
4.2 years ago
anikng • 0

I tried a few methods on determining best k value for k-means clustering. When I applied 2 of these methods for my RNASeq datsest, I am getting k=2 or k=3. Considering the the matrix of 40000 genes with 90 samples, I was expecting higher number of k. I check it with multiple k.max value, but it remained same.Could you please give suggestion?

library(factoextra)
library(NbClust)
Normalized_counts_cpm<-read.csv(file = "...Normalized_counts_cpm.csv", header = TRUE, sep=",", row.names = 1)

With Elbow method,

 fviz_nbclust(Normalized_counts_cpm, kmeans, method = "wss") +
 geom_vline(xintercept = 4, linetype = 2)+
 labs(subtitle = "Elbow method")

With Silhouette method,

fviz_nbclust(Normalized_counts_cpm, kmeans, method = "silhouette",k.max = 30)
RNA-Seq K-means clustering • 750 views
ADD COMMENT
0
Entering edit mode

Do you a priori expect more?

ADD REPLY
0
Entering edit mode

With MeV k means clustering analysis, k=10 gave me reasonable clusters.

ADD REPLY

Login before adding your answer.

Traffic: 1515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6