GSEA with ranked list
1
2
Entering edit mode
9.6 years ago

Hi,

I'm trying to run GSEA on a ranked list of genes. In other words, I'm not using expression data (instead, I'm using a list of genes ranked by the prevalence of variants in those genes in my dataset). I can't figure out how to run GSEA using non-standard input files - either the desktop version or the R version. Each tutorial I can find details how to run GSEA on expression files that contain expression levels from each individual subject, while I already have a list of genes I'm interested in.

GSEA SNP variant-calling • 7.4k views
ADD COMMENT
0
Entering edit mode

You could just pretend that the rankings are expression levels (you may have to reverse the ordering such that the most prevalently affected gene has the highest number). One of the first calls in any GSEA function is rank(), afterall. If that doesn't seem to be working well for you then let me know and I can just post some R code.

ADD REPLY
0
Entering edit mode

Thanks for the help. For some reason, it isn't working correctly. I'm assuming this is my issue, as my coding background is rather weak. I'll keep trying...

ADD REPLY
0
Entering edit mode

You can also try directly doing a ks.test() as lkmklsmn mentioned.

BTW, regardless of the test you end up using, do have a look at the results yourself. Tests like this that compare distributions have some known issues when it comes to finding statistically significant but likely biologically meaningless results.

ADD REPLY
3
Entering edit mode
9.6 years ago
lkmklsmn ▴ 970

The GSEA algorithm is based on the Kolmogorov-Smirnov statistical test. This method test for a shift in ranks between a set of interest and the background. You would basically be asking the question, is this particular set of genes enriched among the top genes in the ranked list of all genes?

This is fairly simple to do in R. The code would look like this (not run):

scores<- a numeric vector of your scores (prevalence of variants) of all genes in your dataset
ranking<-rank(scores)
ind<- a numeric vector containing the indices of your gene set in scores
geneset<-ranking[ind]
background<-ranking[-ind]
ks.test(geneset,background)
ADD COMMENT

Login before adding your answer.

Traffic: 2566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6