how to perform Kolmogorov-Smirnov statistic in GSEA in R?
2
4
Entering edit mode
9.1 years ago
Mo ▴ 920

I have a list of gens set of interest (assume a list of 20 probes) then I have a matrix showing the rank of genes which are highly expressed to low expression (10 columns and 300 rows representing 10 drugs and 300 probes) This matrix is not values but only probes names.

I want to perform the Kolmogorov-smirnov on it. does anybody know whether it is possible or not? if yes, how can I do it in R?

Here is an example of my geneses data

GSEA microarray statistical-analysis r • 7.1k views
ADD COMMENT
1
Entering edit mode
9.1 years ago
Ahill ★ 1.9k

The KS test (in R, ?ks.test) compares continuous distributions. You cannot run it in a meaningful way on a list of probe names ranked by expression level. If you want to test if pre-defined genesets of size ~20 genes had different ranks after treatment with the 10 drugs, I'd look at something like GSEA-preranked.

ADD COMMENT
1
Entering edit mode
9.1 years ago
mark.ziemann ★ 1.9k

GSEA preranked can be run in weighted or classic mode, where classic mode is basically Kolmogorov-Smirnov test (PMID:16199517). There is a stand alone R-GSEA (http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/R-GSEA_Readme). Here is a way I call GSEA-P from within R with this script I call "GSEAPR.R"

# GSEA Preranked runs the preranked GSEA program via a system call.
# it then reads in the genesets up/down

gseaPreRankedCmd <-function(rank,gmtfile,nperm=1000) {
  codex<-paste("gs", gsub("0.","",as.character(runif(1))) ,sep="" )
  wrk<-"output/"
  rnkfilename<-paste(wrk,codex,".rnk",sep="")    
  out<-paste(wrk,codex,sep="")
  write.table(rank,file=rnkfilename,sep="\t",col.names=F,row.names=F,quote=F)
  cmd<- "java -Xmx64G -cp gsea2-2.0.12.jar xtools.gsea.GseaPreranked "
  cmd <- paste(cmd,"-gmx",gmtfile)
  cmd <- paste(cmd, "-gui false -make_sets true -rnd_seed timestamp -norm meandiv -zip_report false -scoring_scheme classic ")
  cmd <- paste(cmd,"-rnk",rnkfilename)
  cmd <- paste(cmd,"-out",out)
  cmd <- paste(cmd,"-set_max 5000 -set_min 5 -mode Max_probe ")
  #cmd <- paste(cmd,"-chip",chipfile)
  cmd <- paste(cmd,"-collapse false -nperm ",nperm," -rpt_label 001 -plot_top_x 200")
  cat(cmd)
  ret<-system(cmd)
  gs<- read.table(pipe(paste("cat ",wrk,codex,"/*/gsea_report_for*.xls|grep -v 'RANK AT MAX'",sep="") ) ,sep="\t")
  return(gs)
}

You can call this R script like this:

#Load rank
rank <- read.delim("/path/to/rank.rnk")
#Define link to gmtfile
gmtfile <- "/path/to/genesets.gmt"`

#Apply function:
source("/path/to/GSEAPR.R")

gseaoutput <- gseaPreRankedCmd(rank, gmtfile ,nperm=1000)

write.table(gseaoutput, file="FILENAMEHERE.TXT", row.names=F, sep="\t", quote=F)

It's important that the gmt file and rank file are formatted perfectly as this is the main source of errors. The other is the chip file specification.

ADD COMMENT
2
Entering edit mode

@mark.ziemann thank you so much Mark but it is more tricky! I tried to make the rnk and gmt. for the rnk I need to have two columns while I only have one which you can find an example here

You can find the geneset file in the main question

I appreciate any further explanation. thanks again

ADD REPLY
1
Entering edit mode

The classic mode of GSEA only takes into the rank of the genes and the direction, so if you have 16,000 genes and 8000 have +'ive foldchange and the rest -'ive fold change, you can give them a score from +8000 to -8000 and the GSEA should work if you abide by all formats for rnk/gmt/chip files.

ADD REPLY
1
Entering edit mode

@mark.ziemann is there any example to do so? I have been googling it for a week !!! still could not find any example or information

ADD REPLY
1
Entering edit mode

The rank file should have this format:

GeneID    Score
A1BG    0.267702
A2M    -0.0917815
A4GALT    0.290421
AAAS    0.465716
AACS    0.127215
AADAC    -1.00927
AADACL3    0.247068
AADAT    -0.0910159
AAED1    -0.761705
AAGAB    0.143995

If you are having problems with formats, you could try the java GUI to troubleshoot them first.

ADD REPLY
0
Entering edit mode

@mark.ziemann how can I make those score for my probe set ? are they expression or fold change or something like that ? or should I calculate them based on a formula?

ADD REPLY
0
Entering edit mode

The signed log10 pvalue is a good choice reference.

ADD REPLY

Login before adding your answer.

Traffic: 1356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6