GSVA score and 1-Pearson correlation results do not harmonize
0
0
Entering edit mode
2.2 years ago
aaragak1 ▴ 40

I calculated GSVA scores for a large set of tumors using a fairly small set of genes (~8 genes in the signature).

I found there to be a distinct bimodal distribution in my data (yes, mx.diff = T).

I then wanted to see if I could 'orthogonally' verify this by hierarchical clustering the tumors using the same small gene signature as the expression set (size factor normalized using DESeq2, log2(x+1) transformed). I expected to see two distinct clusters, which would ultimately correlate to my two GSVA modes. Euclidean distance hierarchical clustering on these data yielded results as expected: two rather disparate groups that correlated to the two modes of GSVA.

However, calculating a 1-Pearson distance matrix (after median centering the data) left me with a dendrogram that looks almost like a fractal: not a distinguishable cluster in sight.

My question is: why do I see this disparity, and which result should I regard as more realistic? Or have I gotten this whole thing turned around an is neither reliable?

Histogram of enrichment scores, showing bimodality:

Hierarchical clustering using a euclid distance metric

Hierarchical clustering using a 1- pearson correlation distance

Let me know if you need anything else!

Thank you for your time - let me know if I need to clarify anything.

R GSVA Clustering • 807 views
0
Entering edit mode

Can you share the results?

0
Entering edit mode

Histogram of enrichment scores, showing bimodality:

Hierarchical clustering using a euclid distance metric

Hierarchical clustering using a 1- pearson correlation distance

Let me know if you need anything else!

0
Entering edit mode

Are you clustering the GSVA results just based on the one enrichment score?

Regardless, I am not sure you necessarily expect to have two distinct clusters based on the expression values. The GSVA values have a bimodal distribution, but there are still many samples in the middle. The middle samples would not cleanly cluster with either group.

0
Entering edit mode

I'm clustering using the normalized, log transformed expression of the 8 or so genes that were used to generate the GSVA score - returning to the roots of the dataset, if you will.

I think you bring up a good point about not necessarily seeing two distinct clusters. I think was confused me the most is the presence of two clusters using the euclidean distance metric but the lack of one using the 1-pearson distance metric. Are you able to speak to that at all?