gProfiler vs Panther_go in overrepresentation pathways R analysis
7 weeks ago
tchosange • 0


Which R tool is reliable to perform overrepresented pathways analysis

I am trying to find overrepresented Reactome pathways in a list of genes. When using Panther_go (both website and package) the results shows one pathways with significant FDR .

panther= panther_go(
  gene_list= c2$Gene ,
  organism="9606", #Human = 9606
  annot_dataset= "panther_reactome_pathway",
  enrichment_test_type = "fisher",
  correction = "fdr") 

here is the significant result :

Homo sapiens (REF) Client Text Box Input ( Hierarchy ) NEW! Tips) Reactome pathways # # expected Fold Enrichment +/- raw P value FDR

Transcriptional regulation by the AP-2 (TFAP2) family of transcription factors 36 6 .17 35.38 + 1.62E-08 4.18E-05

however, when using gProfiler in R, and using the parameters below :

gostres <- gost(query = c1$Gene,  organism = "hsapiens", 
              ordered_query = FALSE, # if the list of genes is ordered by significance
                 multi_query = FALSE,
                 significant = TRUE, #only give significant pathways
                exclude_iea = FALSE, 
                 measure_underrepresentation = FALSE, 
                 evcodes = TRUE, 
                 user_threshold = 0.05, 
                correction_method = "fdr", 
                 domain_scope = "annotated", custom_bg = NULL, 
                numeric_ns = "", sources = "REAC", as_short_link = FALSE, highlight = TRUE)

I find 3 pathways that are overepresented Transcriptional regulation by the AP-2 (TFAP2) family of transcription factors , Signaling by Activin and Antagonism of Activin by Follistatin .

I noticed that the two additional pathways found in gProfiler were also found in Panther_go but they were not significant (p value were > 0.05).

What could explain the difference ? Is there an issue with the parameters ? Which tool is the best and more reliable when it comes to overrepresented pathways analysis ?

Thank you

7 weeks ago

Without digging into either site/package closely, my guess is the number of genesets being tested in each is different, resulting in different adjusted p-values. This could be due to the original genesets in each being generated/collected from Reactome at different times, the exclusion criteria (e.g. min/max size of genesets to include) being different, etc.

The bigger issue is that you're not providing a custom background of genes that are reasonably expressed in your dataset, rendering these results more or less meaningless in my eyes. I have some comments about why this is important in this other answer.

4 weeks ago
NancyTLi ▴ 20

I do not have details to share about differences between the two packages you described, however, you may want to think about using reactome directly for your analysis, if possible. I can quickly find that Panther's documentation states that it is outdated i.e., current data is "from Reactome database version 86, released 2023-09-07)"

Reactome is updated quarterly (4 times a year) while external resources may not be aligned with those release cycles. Currently, Reactome is on database version 89, and each release contains a significant increase in the genes/proteins added to the knowledgebase, and numerous new/updated pathways.

If you have questions or requests specifically for the Reactome team, please send an email to for further assistance. Thanks!


