Question

Performing GSEA using MSigDB gene sets in R

3

Entering edit mode

5.6 years ago

wilsonav ▴ 30

I am trying to perform a gene set enrichment analysis in r using the gene sets available from msigdb and a list of gene names from my own data set.

I am able to to use the msigdbr library to import the gene collections from msigdb into r, but I am unsure of how to specifically use a function to compute the overlaps between the genes in my gene set and the gene sets in msigdb and obtain the FDR p-values. Are there any tutorials online for this method or example codes?

Thank you

R GSEA msigdb • 17k views

ADD COMMENT • link updated 2.7 years ago by Ahmed Alhendi ▴ 230 • written 5.6 years ago by wilsonav ▴ 30

score 5 · Answer 1 · 2018-09-25

5

Entering edit mode

5.6 years ago

igor 13k

You can try the fgsea package, which is probably most similar to the original GSEA. It can be run in a single command:

fgseaRes <- fgsea(pathways = examplePathways,  stats = exampleRanks)

Check the vignette for more details: https://www.bioconductor.org/packages/devel/bioc/vignettes/fgsea/inst/doc/fgsea-tutorial.html

There is also an example in the msigdbr vignette: https://cran.r-project.org/web/packages/msigdbr/vignettes/msigdbr-intro.html

ADD COMMENT • link 4.2 years ago by igor 13k

1

Entering edit mode

It's worth noting that fgsea is similar to GSEA-Preranked rather than to the original GSEA method published in the GSEA articles that used sample permutation.

ADD REPLY • link 3.8 years ago by Gordon Smyth ★ 7.0k

1

Entering edit mode

Yes. Until very recently, the recommendation from GSEA developers was to use the pre-ranked GSEA for RNA-seq data, so that has been the default one in my mind since most transcriptomic data is RNA-seq.

ADD REPLY • link 3.8 years ago by igor 13k

0

Entering edit mode

OK. It's an important issue because GSEA-preranked doesn't give proper FDR control (and hasn't been claimed to do so by the GSEA developers as far as I know).

ADD REPLY • link 3.8 years ago by Gordon Smyth ★ 7.0k

score 5 · Answer 2 · 2021-08-19

i tried msigdbr package and did work find for me. it provides you with msigdb that compatible with fgsea and clusterProfiler. For example, I use it to do the fgea with human hallmark gene sets

library(msigdbr)
library(fgsea)

#Retrieve human H (hallmark) gene set
msigdbr_df <- msigdbr(species = "human", category = "H")


head(msigdbr_df)
# A tibble: 6 x 15
  gs_cat gs_subcat gs_name gene_symbol entrez_gene ensembl_gene human_gene_symb…
  <chr>  <chr>     <chr>   <chr>             <int> <chr>        <chr>           
1 H      ""        HALLMA… ABCA1                19 ENSG0000016… ABCA1           
2 H      ""        HALLMA… ABCB8             11194 ENSG0000019… ABCB8           
3 H      ""        HALLMA… ACAA2             10449 ENSG0000016… ACAA2           
4 H      ""        HALLMA… ACADL                33 ENSG0000011… ACADL           
5 H      ""        HALLMA… ACADM                34 ENSG0000011… ACADM           
6 H      ""        HALLMA… ACADS                35 ENSG0000012… ACADS      

# fixing format to work with fgsea
pathwaysH = split(x = msigdbr_df$entrez_gene, f = msigdbr_df$gs_name)

# run fgsea enrichment
fgseaRes <- fgsea(pathways=pathwaysH, ranks, ..)