Dealing with very large gene-lists in GSEA
Entering edit mode
10 weeks ago

I'm using fgsea to do gene-set enrichment using the ENCODE transcription factor targets dataset.

However, some of the gene lists are very large and I suspect this is causing my gene-set enrichment to fail to find many significant enrichments due to how the normalisation step works. What is the most appropriate way to systematically deal with very large gene-lists in GSEA?

From the GSEA User Guide: "Nevertheless, the normalization is not very accurate for extremely small or extremely large gene sets. For example, for gene sets with fewer than 10 genes, just 2 or 3 genes can generate significant results. Therefore, by default, GSEA ignores gene sets that contain fewer than 15 genes or more than 500 genes"

R fgsea GSEA • 216 views
Entering edit mode
10 weeks ago
Trivas ★ 1.2k

IMO, in those cases you could look at ES instead of NES. Regardless, within the function fgsea, you can set the parameters minSize and maxSize. From the "quick guide" on github it shows the recommended parameters to be 15 and 500 like you mentioned.

fgseaRes <- fgsea(pathways = examplePathways, 
                  stats    = exampleRanks,
                  minSize  = 15,
                  maxSize  = 500)

However, if you look at the help documentation within R (e.g. ?fgsea) you see:

  minSize = 1,
  maxSize = length(stats) - 1,
  gseaParam = 1,

Meaning if you run fgsea without changing any parameters, it will show gene sets from size 1 to the number of genes you have stat values for.


Login before adding your answer.

Traffic: 2072 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6