I have a RNA seq dataset for which I wanted to know the "ideal" number of clusters to perform the statistical analysis on. After applying the elbow, silhouette and NBclust, I got them all agreeing in 3 main clusters.
This has been done on my dataset already transformed following https://www.statsandr.com/blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/
df <- scale(assay(vsd))
I am happy with my 3 clusters of samples, However, I have also seen k-clusters are also used to define gene expression profiles across the samples. C: How to make k-means clustering plot for relative expression?
So just to confirm, as if these three "ideal" clusters are referring to the group sample similarity? - ie there are three main clusters of samples based on how similar their transcriptome is (which correlated with the Euclidean distance's dendrogram) or
Are these three groups the three main expression profiles across all my samples - i) upregulation on group A, down regulation on group B, and other. - this also coincides with my dataset heat map.