I am doing a Preranked GSEA Analysis on the output of a differential expression analysis (genes ranked by sign(shrunken lfc) * -log10(pvalue) ).
I notice that when I feed the exact same files into GSEA, it can give me fairly different FDRs.
For example, when I ran the analysis the first time, it said it detected "32 gene sets are significantly enriched at FDR < 25%".
Ten minutes later, I ran the same files again, and it detected "4 gene sets are significantly enriched at FDR < 25%".
I know that there is some element of randomness in the GSEA algorithm because it sets a seed based on timepoint, but I'm surprised to see such a large difference in the number of pathways passing threshold from run to run.
Is this to be expected?
Thanks for your help.
UPDATE I also notice that the four gene sets called enriched in Round 2 are completely non-overlapping with the thirty-two gene sets enriched in Round 1. This is surprising and a little concerning to me.