Entering edit mode
3.1 years ago
anikng • 0
I tried a few methods on determining best k value for k-means clustering. When I applied 2 of these methods for my RNASeq datsest, I am getting k=2 or k=3. Considering the the matrix of 40000 genes with 90 samples, I was expecting higher number of k. I check it with multiple k.max value, but it remained same.Could you please give suggestion?
library(factoextra) library(NbClust) Normalized_counts_cpm<-read.csv(file = "...Normalized_counts_cpm.csv", header = TRUE, sep=",", row.names = 1)
With Elbow method,
fviz_nbclust(Normalized_counts_cpm, kmeans, method = "wss") + geom_vline(xintercept = 4, linetype = 2)+ labs(subtitle = "Elbow method")
With Silhouette method,
fviz_nbclust(Normalized_counts_cpm, kmeans, method = "silhouette",k.max = 30)
Do you a priori expect more?
With MeV k means clustering analysis, k=10 gave me reasonable clusters.