GSEA preranked can be run in weighted or classic mode, where classic mode is basically Kolmogorov-Smirnov test (PMID:16199517). There is a stand alone R-GSEA (http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/R-GSEA_Readme). Here is a way I call GSEA-P from within R with this script I call "GSEAPR.R"
# GSEA Preranked runs the preranked GSEA program via a system call.
# it then reads in the genesets up/down
gseaPreRankedCmd <-function(rank,gmtfile,nperm=1000) {
codex<-paste("gs", gsub("0.","",as.character(runif(1))) ,sep="" )
wrk<-"output/"
rnkfilename<-paste(wrk,codex,".rnk",sep="")
out<-paste(wrk,codex,sep="")
write.table(rank,file=rnkfilename,sep="\t",col.names=F,row.names=F,quote=F)
cmd<- "java -Xmx64G -cp gsea2-2.0.12.jar xtools.gsea.GseaPreranked "
cmd <- paste(cmd,"-gmx",gmtfile)
cmd <- paste(cmd, "-gui false -make_sets true -rnd_seed timestamp -norm meandiv -zip_report false -scoring_scheme classic ")
cmd <- paste(cmd,"-rnk",rnkfilename)
cmd <- paste(cmd,"-out",out)
cmd <- paste(cmd,"-set_max 5000 -set_min 5 -mode Max_probe ")
#cmd <- paste(cmd,"-chip",chipfile)
cmd <- paste(cmd,"-collapse false -nperm ",nperm," -rpt_label 001 -plot_top_x 200")
cat(cmd)
ret<-system(cmd)
gs<- read.table(pipe(paste("cat ",wrk,codex,"/*/gsea_report_for*.xls|grep -v 'RANK AT MAX'",sep="") ) ,sep="\t")
return(gs)
}
You can call this R script like this:
#Load rank
rank <- read.delim("/path/to/rank.rnk")
#Define link to gmtfile
gmtfile <- "/path/to/genesets.gmt"`
#Apply function:
source("/path/to/GSEAPR.R")
gseaoutput <- gseaPreRankedCmd(rank, gmtfile ,nperm=1000)
write.table(gseaoutput, file="FILENAMEHERE.TXT", row.names=F, sep="\t", quote=F)
It's important that the gmt file and rank file are formatted perfectly as this is the main source of errors. The other is the chip file specification.
@mark.ziemann thank you so much Mark but it is more tricky! I tried to make the rnk and gmt. for the rnk I need to have two columns while I only have one which you can find an example here
You can find the geneset file in the main question
I appreciate any further explanation. thanks again
The classic mode of GSEA only takes into the rank of the genes and the direction, so if you have 16,000 genes and 8000 have +'ive foldchange and the rest -'ive fold change, you can give them a score from +8000 to -8000 and the GSEA should work if you abide by all formats for rnk/gmt/chip files.
@mark.ziemann is there any example to do so? I have been googling it for a week !!! still could not find any example or information
The rank file should have this format:
If you are having problems with formats, you could try the java GUI to troubleshoot them first.
@mark.ziemann how can I make those score for my probe set ? are they expression or fold change or something like that ? or should I calculate them based on a formula?
The signed log10 pvalue is a good choice reference.