GSVA score and 1-Pearson correlation results do not harmonize
0
0
Entering edit mode
3.9 years ago
aaragak1 ▴ 40

I calculated GSVA scores for a large set of tumors using a fairly small set of genes (~8 genes in the signature).

I found there to be a distinct bimodal distribution in my data (yes, mx.diff = T).

I then wanted to see if I could 'orthogonally' verify this by hierarchical clustering the tumors using the same small gene signature as the expression set (size factor normalized using DESeq2, log2(x+1) transformed). I expected to see two distinct clusters, which would ultimately correlate to my two GSVA modes. Euclidean distance hierarchical clustering on these data yielded results as expected: two rather disparate groups that correlated to the two modes of GSVA.

However, calculating a 1-Pearson distance matrix (after median centering the data) left me with a dendrogram that looks almost like a fractal: not a distinguishable cluster in sight.

My question is: why do I see this disparity, and which result should I regard as more realistic? Or have I gotten this whole thing turned around an is neither reliable?

Histogram of enrichment scores, showing bimodality:

Histogram of Enrichment Scores

Hierarchical clustering using a euclid distance metric

Hierarchical Clustering, Euclid Distance

Hierarchical clustering using a 1- pearson correlation distance

Hierarchical Clustering, Pearson Distance

Let me know if you need anything else!

Thank you for your time - let me know if I need to clarify anything.

R GSVA Clustering • 1.4k views
ADD COMMENT
0
Entering edit mode

Can you share the results?

ADD REPLY
0
Entering edit mode

Histogram of enrichment scores, showing bimodality:

Histogram of Enrichment Scores

Hierarchical clustering using a euclid distance metric

Hierarchical Clustering, Euclid Distance

Hierarchical clustering using a 1- pearson correlation distance

Hierarchical Clustering, Pearson Distance

Let me know if you need anything else!

ADD REPLY
0
Entering edit mode

Are you clustering the GSVA results just based on the one enrichment score?

Regardless, I am not sure you necessarily expect to have two distinct clusters based on the expression values. The GSVA values have a bimodal distribution, but there are still many samples in the middle. The middle samples would not cleanly cluster with either group.

ADD REPLY
0
Entering edit mode

I'm clustering using the normalized, log transformed expression of the 8 or so genes that were used to generate the GSVA score - returning to the roots of the dataset, if you will.

I think you bring up a good point about not necessarily seeing two distinct clusters. I think was confused me the most is the presence of two clusters using the euclidean distance metric but the lack of one using the 1-pearson distance metric. Are you able to speak to that at all?

Thank you for your time!

ADD REPLY

Login before adding your answer.

Traffic: 2211 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6