Performing GSEA using MSigDB gene sets in R
2
3
Entering edit mode
3.2 years ago
wilsonav ▴ 30

I am trying to perform a gene set enrichment analysis in r using the gene sets available from msigdb and a list of gene names from my own data set.

I am able to to use the msigdbr library to import the gene collections from msigdb into r, but I am unsure of how to specifically use a function to compute the overlaps between the genes in my gene set and the gene sets in msigdb and obtain the FDR p-values. Are there any tutorials online for this method or example codes?

Thank you

R GSEA msigdb • 7.8k views
ADD COMMENT
5
Entering edit mode
3.2 years ago
igor 12k

You can try the fgsea package, which is probably most similar to the original GSEA. It can be run in a single command:

fgseaRes <- fgsea(pathways = examplePathways,  stats = exampleRanks)

Check the vignette for more details: https://www.bioconductor.org/packages/devel/bioc/vignettes/fgsea/inst/doc/fgsea-tutorial.html

There is also an example in the msigdbr vignette: https://cran.r-project.org/web/packages/msigdbr/vignettes/msigdbr-intro.html

ADD COMMENT
1
Entering edit mode

It's worth noting that fgsea is similar to GSEA-Preranked rather than to the original GSEA method published in the GSEA articles that used sample permutation.

ADD REPLY
1
Entering edit mode

Yes. Until very recently, the recommendation from GSEA developers was to use the pre-ranked GSEA for RNA-seq data, so that has been the default one in my mind since most transcriptomic data is RNA-seq.

ADD REPLY
0
Entering edit mode

OK. It's an important issue because GSEA-preranked doesn't give proper FDR control (and hasn't been claimed to do so by the GSEA developers as far as I know).

ADD REPLY
4
Entering edit mode
3 months ago
Ahmed Alhendi ▴ 180

i tried msigdbr package and did work find for me. it provides you with msigdb that compatible with fgsea and clusterProfiler. For example, I use it to do the fgea with human hallmark gene sets

library(msigdbr)
library(fgsea)

#Retrieve human H (hallmark) gene set
msigdbr_df <- msigdbr(species = "human", category = "H")


head(msigdbr_df)
# A tibble: 6 x 15
  gs_cat gs_subcat gs_name gene_symbol entrez_gene ensembl_gene human_gene_symb…
  <chr>  <chr>     <chr>   <chr>             <int> <chr>        <chr>           
1 H      ""        HALLMA… ABCA1                19 ENSG0000016… ABCA1           
2 H      ""        HALLMA… ABCB8             11194 ENSG0000019… ABCB8           
3 H      ""        HALLMA… ACAA2             10449 ENSG0000016… ACAA2           
4 H      ""        HALLMA… ACADL                33 ENSG0000011… ACADL           
5 H      ""        HALLMA… ACADM                34 ENSG0000011… ACADM           
6 H      ""        HALLMA… ACADS                35 ENSG0000012… ACADS      

# fixing format to work with fgsea
pathwaysH = split(x = msigdbr_df$entrez_gene, f = msigdbr_df$gs_name)

# run fgsea enrichment
fgseaRes <- fgsea(pathways=pathwaysH, ranks, ..)
ADD COMMENT

Login before adding your answer.

Traffic: 3270 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6