Hello,
I'm trying to optimize my deeptools computeMatrix output for k-means clustering, but cannot properly generate an elbow plot.
I've tried loading the matrix as described in this post and then attempting to plot wss in R via:
m = read.delim("computeMatrixOperations.mat.gz", skip=1, header=F)
m = as.matrix(m[,-c(1:6)])
set.seed(123)
# Compute and plot wss for k = 2 to k = 15.
k.max <- 15
wss <- sapply(1:k.max, function(k){kmeans(m, k, nstart=50,iter.max = 15 )$tot.withinss})
plot(1:k.max, wss, type="b", pch = 19, frame = FALSE, xlab="Number of clusters K",ylab="Total within-clusters sum of squares")
But this is too computationally heavy for a ~180,000 x 720 matrix (even using c5n.18xlarge: 72 vCPUs + 192 GiB memory for a few hours) and perhaps incorrect. I have some more ideas on how this might be computed (e.g. with the .tab output) but ANY help would be appreciated since testing is rather computationally and time intensive.
I've also been experimenting with profileplyr which is a nice library but not explicitly for optimizing k-means clusters.
This worked well, thank you Devon.