1
4
Entering edit mode
8.0 years ago
rolyata47 ▴ 40

The Gene Set Enrichment Algorithm, outlined in this paper, http://www.broadinstitute.org/gsea/doc/subramanian_tamayo_gsea_pnas.pdf, refers often to a "random walk" used to traverse the ranked list L of gene-to-phenotype correlations.

However, what they actually do in the paper does not look like a random walk at all. It seems to me that they traverse the ranked list L sequentially, from rank 1 (highest correlation) onwards.

I was wondering if anyone could clear up the confusion of what they mean by "random walk", and why they use the term, when really it looks like they are doing a sequential walk, quite the opposite.

Also, as a follow-up question, how is it that they do not bias the top of the ranked list L over the bottom? If we assume for the moment that they are doing a sequential walk, which seems to be the case, then the gene sets found at the bottom extreme will have a larger value for P_miss, since P_miss is proportional to i. As a consequence, they will have smaller enrichment scores.

Perhaps this is related to the question above, since a sequential walk does not seem to work here...

I appreciate any help... I suspect I am not understanding something correctly...

• 4.0k views
5
Entering edit mode
8.0 years ago
Leandro Lima ▴ 960

Hello!

0
Entering edit mode

Hey, thanks! I think this article made it clear. They are comparing the supremum (ES) with what it would be for a random walk... gene sets found at the top or the bottom will have a higher ES, and gene sets that are randomly distributed will resemble a random walk - thanks!