The Gene Set Enrichment Algorithm, outlined in this paper, http://www.broadinstitute.org/gsea/doc/subramanian_tamayo_gsea_pnas.pdf, refers often to a "random walk" used to traverse the ranked list L of gene-to-phenotype correlations.
However, what they actually do in the paper does not look like a random walk at all. It seems to me that they traverse the ranked list L sequentially, from rank 1 (highest correlation) onwards.
I was wondering if anyone could clear up the confusion of what they mean by "random walk", and why they use the term, when really it looks like they are doing a sequential walk, quite the opposite.
Also, as a follow-up question, how is it that they do not bias the top of the ranked list
L over the bottom? If we assume for the moment that they are doing a sequential walk, which seems to be the case, then the gene sets found at the bottom extreme will have a larger value for
P_miss is proportional to
i. As a consequence, they will have smaller enrichment scores.
Perhaps this is related to the question above, since a sequential walk does not seem to work here...
I appreciate any help... I suspect I am not understanding something correctly...