Question

Choosing enrichment analysis tool

0

Entering edit mode

2 days ago

Marlene • 0

So I have been exploring the comorbidity of 2 neurodevelopmental disorders, by curating high confidence gene list. Let's call them A and B and as of now, they have 1270 and 250 genes respectively. I want to perform enrichment analysis on the individual lists first, and then perform a combined analysis to explore unique and shared pathways or processes. I explored using the 'multiple gene list' option in metascape. Does anybody have any suggestions or ideas?

DAVID Metascape Enrichr • 243 views

ADD COMMENT • link updated 4 hours ago by i.sudbery 22k • written 2 days ago by Marlene • 0

0

Entering edit mode

How have these gene lists been obtained?

ADD REPLY • link 1 day ago by i.sudbery 22k

0

Entering edit mode

From databases dedicated to the specific disorder, GWAS catalog, Disgenet, OMIM and Literature searches across Pubmed.

ADD REPLY • link 10 hours ago by Marlene • 0

score 0 · Answer 1 · 2025-10-15

Standard enrichement tools, like Metascape, Enrichr or David assume that under a model of no pathway being particularly involved in what you are studying, all genes are as likely to appear in your gene lists as all others. This assumption is violated to a greater or lesser extent in datasets from different sources.

However, the data source under which we see the biggest violation of this assumption is when examining which genes have mutations in disease. The reasons for this are two fold:

Firstly longer genes are more likely to have mutations or SNPs in them. Some gene-sets are also systematically biased towards longer genes (neuronally associated genes are the classic example).
Secondly, some genes will be well tagged by SNPs typed in GWAS studies, while others will be less so. Any similar bias in the assignment of genes to pathways (i.e. if some pathways tend to contain better tagged genes than others) will bias your results.
Some genes are better studied than others. Such better studied genes are more likely to have recorded disease associations when including data from non-systematic approches (e.g. OMIM or Lit Searches).

As a result of this, each data source requires a different approach to testing enrichment. There are several algorithms designed specifically for testing pathway enrichment from GWAS results.

Examples include SSEA, GSA-SNP2, MAGMA and Pascal, although, as far as I'm aware none of these have web interfaces, and all take full GWAS summary statistics, rather than lists of associated genes.

I know of no method that corrects for the final type of bias. So for genes from the literature, perhaps it is worth just using standard enrichment tools, being aware that these are likely to be bias. I would never use these tools for genes from GWAS, WGA or Exome studies though.