GSVA scores for a large set of tumors using a fairly small set of genes (~8 genes in the signature).
I found there to be a distinct bimodal distribution in my data (yes,
mx.diff = T).
I then wanted to see if I could 'orthogonally' verify this by hierarchical clustering the tumors using the same small gene signature as the expression set (size factor normalized using
DESeq2, log2(x+1) transformed). I expected to see two distinct clusters, which would ultimately correlate to my two
GSVA modes. Euclidean distance hierarchical clustering on these data yielded results as expected: two rather disparate groups that correlated to the two modes of
However, calculating a 1-Pearson distance matrix (after median centering the data) left me with a dendrogram that looks almost like a fractal: not a distinguishable cluster in sight.
My question is: why do I see this disparity, and which result should I regard as more realistic? Or have I gotten this whole thing turned around an is neither reliable?
Histogram of enrichment scores, showing bimodality:
Hierarchical clustering using a euclid distance metric
Hierarchical clustering using a 1- pearson correlation distance
Let me know if you need anything else!
Thank you for your time - let me know if I need to clarify anything.