Question: GSEA with ranked list
2
gravatar for chloe.p.oconnell
5.0 years ago by
United States
chloe.p.oconnell80 wrote:

Hi, 

 

I'm trying to run GSEA on a ranked list of genes. In other words, I'm not using expression data (instead, I'm using a list of genes ranked by the prevalence of variants in those genes in my dataset). I can't figure out how to run GSEA using non-standard input files - either the desktop version or the R version. Each tutorial I can find details how to run GSEA on expression files that contain expression levels from each individual subject, while I already have a list of genes I'm interested in. 

ADD COMMENTlink written 5.0 years ago by chloe.p.oconnell80

You could just pretend that the rankings are expression levels (you may have to reverse the ordering such that the most prevalently affected gene has the highest number). One of the first calls in any GSEA function is rank(), afterall. If that doesn't seem to be working well for you then let me know and I can just post some R code.

ADD REPLYlink written 5.0 years ago by Devon Ryan91k

Thanks for the help. For some reason, it isn't working correctly. I'm assuming this is my issue, as my coding background is rather weak. I'll keep trying...

ADD REPLYlink written 5.0 years ago by chloe.p.oconnell80

You can also try directly doing a ks.test() as lkmklsmn mentioned.

BTW, regardless of the test you end up using, do have a look at the results yourself. Tests like this that compare distributions have some known issues when it comes to finding statistically significant but likely biologically meaningless results.

ADD REPLYlink written 5.0 years ago by Devon Ryan91k
3
gravatar for lkmklsmn
5.0 years ago by
lkmklsmn890
United States
lkmklsmn890 wrote:

The GSEA algorithm is based on the Kolmogorov-Smirnov statistical test. This method test for a shift in ranks between a set of interest and the background. You would basically be asking the question, is this particular set of genes enriched among the top genes in the ranked list of all genes?  

This is fairly simple to do in R. The code would look like this (not run):  

scores<- a numeric vector of your scores (prevalence of variants) of all genes in your dataset  

ranking<-rank(scores)  

ind<- a numeric vector containing the indices of your gene set in scores  

geneset<-ranking[ind]  

background<-ranking[-ind]  

ks.test(geneset,background)  

 

 

ADD COMMENTlink modified 5.0 years ago by Istvan Albert ♦♦ 81k • written 5.0 years ago by lkmklsmn890
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1503 users visited in the last hour