I'm trying to run GSEA on a ranked list of genes. In other words, I'm not using expression data (instead, I'm using a list of genes ranked by the prevalence of variants in those genes in my dataset). I can't figure out how to run GSEA using non-standard input files - either the desktop version or the R version. Each tutorial I can find details how to run GSEA on expression files that contain expression levels from each individual subject, while I already have a list of genes I'm interested in.
The GSEA algorithm is based on the Kolmogorov-Smirnov statistical test. This method test for a shift in ranks between a set of interest and the background. You would basically be asking the question, is this particular set of genes enriched among the top genes in the ranked list of all genes?
This is fairly simple to do in R. The code would look like this (not run):
scores<- a numeric vector of your scores (prevalence of variants) of all genes in your dataset
ind<- a numeric vector containing the indices of your gene set in scores