I hope this is not a stupid question but I have done hierarchical cluster (euclidian distance matrix + complete linkage method) on a subset number of genes (8000) in my bulkRNAseq samples (40) and I found that 13 samples do not cluster as expected/predicted. I run also a PCA and, in line with the hierarchical cluster, those samples cluster far apart from the others.
Is there any way (or R package) I can identify which genes are responsible for the different cluster without using the PCA (eg., the identification of the loadings)?
Practical example in the dendrogram from this site dendo what makes purple samples( 7-13-16) differ from the red ones but also what makes the red + purple cluster in another brach/arm compared to the blue-green samples?
I guess there are genes that would make all the samples cluster together and genes that are very different so they would make the samples cluster far apart, and this could be potentially observed in terms of macro-differences (red/pruple vs green/blue) or micro-differences (red vs purple).
thank you in advance