Question

GSVA and SSGSEA for RNA-Seq TPM data

1

Entering edit mode

4.4 years ago

young_bioinformatician ▴ 230

Hi,

I try to analyze single sample gene set enrichment analysis on my data which is RNA-seq gene expression. I'd like to noticed that my data has got continous values not discrete or count.

After GSVA, number of the results I got were almost 400 gene sets with enrichment scores. Now, I am a bit confused whether I am on the right way of analyzing of single sample gene set enrichment for that data until this step. Also, I have searched some article about NES score but it was not clear for me. How can I interpret my results on these scores If I draw networks between pathways ? Like, some pathway are on/off for normal and disease. Because I do not think I will interpret it as up- down- regulated.

My second question, Is there any limitation (top and bottom) between these enrichment scores that are negatives and positives.

Also, I wonder the library of singscore but It is a bit complex and different with GSVA. Is there anyone who has used it for like this purpose before ? Thank you very much in advance.

RNA-Seq GSVA R • 9.8k views

ADD COMMENT • link updated 4.4 years ago by Kevin Blighe 87k • written 4.4 years ago by young_bioinformatician ▴ 230

score 1 · Answer 1 · 2019-11-13

1

Entering edit mode

4.4 years ago

Kevin Blighe 87k

When you run GSVA, it will enrich your input data against every gene signature / pathway in the database that you provide. Your input data is typically a data matrix consisting of samples (columns) X genes (rows). Generally, negative enrichment values imply down-regulation of a signature / pathway; whereas, positive values imply up-regulation.

After you enrich your data using GSVA, you should have a new data matrix of enrichment values that consists of samples (columns) X signatures / pathways (rows). The idea is to then conduct a differential signature / pathway analysis (using, for example, limma) so that you can have, in addition to differentially expressed genes, differentially expressed signatures / pathways.

There is a also a lot more that can be done with the results of GSVA, but the basic approach is what I have outlined above.

Kevin

ADD COMMENT • link 4.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin, thank you for reply. I had thougt that like you said about up- and down-regulated in values. However, there were some biases in my result. That's why I am a bit confused. Like, the pathway which belongs to cancer appears up-regulated for all control samples. I did not expect this. After this, I wanted to check the values in the matrix then I realized the value of control samples were really close to the value of cancer samples for this specific pathway.

Also, Do you know whether there is any limitation in enrichment scores for positive and negative ? For example, Can we say that positive values are between 0 and +1 or something like that?

ADD REPLY • link 4.4 years ago by young_bioinformatician ▴ 230

1

Entering edit mode

Hey, there is no specific cut-off / limit / threshold for positive / negative (activated / not activated). How did your run GSVA, I mean, which method did you choose?

Sometimes, after I run GSVA, I then convert my output data matrix to Z-scores, and then it can be easier to select the gene signatures / pathways that are statistically significantly enriched. Z>1.96 and Z<-1.96 are equivalent to p<0.05.

Regarding the false-positive enrichment for cancer, these false positive associations occur 'all of the time' and are an unfortunate consequence of the fact that many disease pathways overlap with normal biological function.

ADD REPLY • link 4.4 years ago by Kevin Blighe 87k

1

Entering edit mode

I Got it. Therefore, I can't apply any thresholds on it. I used ssgsea as method but In the GSVA article, there is no significant difference between gsva and ssgsea, I think. If it really depends on the method I chose, I can change the method but I don't think so the result will change.

Z-scores makes sense, I can also try it. But I guess, there is no way which avoid these biases. I mean, overlapping pathways with control samples. Because this causes bias on the results.

ADD REPLY • link 4.4 years ago by young_bioinformatician ▴ 230