Standard enrichement tools, like Metascape, Enrichr or David assume that under a model of no pathway being particularly involved in what you are studying, all genes are as likely to appear in your gene lists as all others. This assumption is violated to a greater or lesser extent in datasets from different sources.
However, the data source under which we see the biggest violation of this assumption is when examining which genes have mutations in disease. The reasons for this are two fold:
- Firstly longer genes are more likely to have mutations or SNPs in them. Some gene-sets are also systematically biased towards longer genes (neuronally associated genes are the classic example).
- Secondly, some genes will be well tagged by SNPs typed in GWAS studies, while others will be less so. Any similar bias in the assignment of genes to pathways (i.e. if some pathways tend to contain better tagged genes than others) will bias your results.
- Some genes are better studied than others. Such better studied genes are more likely to have recorded disease associations when including data from non-systematic approches (e.g. OMIM or Lit Searches).
As a result of this, each data source requires a different approach to testing enrichment. There are several algorithms designed specifically for testing pathway enrichment from GWAS results.
Examples include SSEA, GSA-SNP2, MAGMA and Pascal, although, as far as I'm aware none of these have web interfaces, and all take full GWAS summary statistics, rather than lists of associated genes.
I know of no method that corrects for the final type of bias. So for genes from the literature, perhaps it is worth just using standard enrichment tools, being aware that these are likely to be bias. I would never use these tools for genes from GWAS, WGA or Exome studies though.
How have these gene lists been obtained?
From databases dedicated to the specific disorder, GWAS catalog, Disgenet, OMIM and Literature searches across Pubmed.