I have a data set where the rows are genes and columns are phylogenetic profiling scores.
I clustered this dataset of genes in Hierarchial Clustering in R, and got a dendogram of the hclust() output. I need to identify the number of clusters, so that genes in the same cluster will be very similar to each other according to the values of the columns, (genes that have similar values in the same columns belong to the same cluster) and basicly split the data into modules. I need to find a systematic way to do that, simultaneously on a lot of datasets, without the involvement and optimization of human.
I used the function NbClust() which gave a not enough appropriate output as some genes appear in the same cluster although they are not enough similar:(
I would really appriciate to get an idea of a R function to take out genes that are not related to the cluster, or a better function to determine the best number of clusters that consider the possibility to not include some genes.