gProfiler vs Panther_go in overrepresentation pathways R analysis
2
0
Entering edit mode
7 weeks ago
tchosange • 0

Hello,

Which R tool is reliable to perform overrepresented pathways analysis

I am trying to find overrepresented Reactome pathways in a list of genes. When using Panther_go (both website and package) the results shows one pathways with significant FDR .

panther= panther_go(
  gene_list= c2$Gene ,
  organism="9606", #Human = 9606
  annot_dataset= "panther_reactome_pathway",
  enrichment_test_type = "fisher",
  correction = "fdr") 

here is the significant result :

Homo sapiens (REF) Client Text Box Input ( Hierarchy ) NEW! Tips) Reactome pathways # # expected Fold Enrichment +/- raw P value FDR

Transcriptional regulation by the AP-2 (TFAP2) family of transcription factors 36 6 .17 35.38 + 1.62E-08 4.18E-05

however, when using gProfiler in R, and using the parameters below :

gostres <- gost(query = c1$Gene,  organism = "hsapiens", 
              ordered_query = FALSE, # if the list of genes is ordered by significance
                 multi_query = FALSE,
                 significant = TRUE, #only give significant pathways
                exclude_iea = FALSE, 
                 measure_underrepresentation = FALSE, 
                 evcodes = TRUE, 
                 user_threshold = 0.05, 
                correction_method = "fdr", 
                 domain_scope = "annotated", custom_bg = NULL, 
                numeric_ns = "", sources = "REAC", as_short_link = FALSE, highlight = TRUE)

I find 3 pathways that are overepresented Transcriptional regulation by the AP-2 (TFAP2) family of transcription factors , Signaling by Activin and Antagonism of Activin by Follistatin .

I noticed that the two additional pathways found in gProfiler were also found in Panther_go but they were not significant (p value were > 0.05).

What could explain the difference ? Is there an issue with the parameters ? Which tool is the best and more reliable when it comes to overrepresented pathways analysis ?

Thank you

pathways Reactome Panther Overrepresentation gProfiler • 403 views
ADD COMMENT
0
Entering edit mode
7 weeks ago

Without digging into either site/package closely, my guess is the number of genesets being tested in each is different, resulting in different adjusted p-values. This could be due to the original genesets in each being generated/collected from Reactome at different times, the exclusion criteria (e.g. min/max size of genesets to include) being different, etc.

The bigger issue is that you're not providing a custom background of genes that are reasonably expressed in your dataset, rendering these results more or less meaningless in my eyes. I have some comments about why this is important in this other answer.

ADD COMMENT
0
Entering edit mode
4 weeks ago
NancyTLi ▴ 20

I do not have details to share about differences between the two packages you described, however, you may want to think about using reactome directly for your analysis, if possible. I can quickly find that Panther's documentation states that it is outdated i.e., current data is "from Reactome database version 86, released 2023-09-07)"

Reactome is updated quarterly (4 times a year) while external resources may not be aligned with those release cycles. Currently, Reactome is on database version 89, and each release contains a significant increase in the genes/proteins added to the knowledgebase, and numerous new/updated pathways.

If you have questions or requests specifically for the Reactome team, please send an email to help@reactome.org for further assistance. Thanks!

ADD COMMENT

Login before adding your answer.

Traffic: 2627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6