I am trying to implement a R function which does the GSEA.
I read many papers related to this method and each of them tries to destroy the other and show a better performance of its own method (that is what we do as scientists :-D )
Anyway, what I am now working on is to find out how running sum works to calculate the Score!
The running sum is to calculate the Enrichment Score over a gene set
1- how to define a gene set ? for example if I have over 20000 genes, can I say the first 200 are one set , and the rest is another set ?
2- how to calculate it ? what they say
"a Kolmogorov-Smirnov (K-S) running sum statistic is computed: beginning with the top-ranking gene, the running sum increases when a gene annotated to be a member of gene set S is encountered and decreases otherwise"
Can someone explain how does this technique work ?
Can it be done for one sample ? if not why ?