Forum:Not all Over Representation Analyses are the same!
Entering edit mode
2.0 years ago
Francesco ▴ 20

By comparing some discrepancies between the results of Over Representation Analyses (one-sided Fisher exact test, a.k.a. hypergeometric test) performed with enricher() (from ClusterProfiler R library) and with other web tools such as MsigDB, I realized there is an unaddressed ambiguity (it was at least for me) in the definition of genes in the query list (eg. upregulated genes) and genes in the universe/background:

Illustration of the different definition of k and N by cluserProfiler

While other tools and general workshops suggest that k should be the complete query list and N the universe of measurable genes (e.g. the whole transcriptome for RNAseq), ClusterProfiler (I think the most widely used library for pathway analysis in R) restricts the analysis to only genes present in the annotation set in use.

That leads of course to generally larger p-values than what we would get with the conventional approach. I feel that restricting the analysis to only annotated genes is reasonable and more specific, but I think it's worth opening a discussion about that. Which approach do you usually use/recommend? Do you have any opinions to share about it?

P.S. I also opened a discussion on the GitHub page of ClusterProfiler (

clusterProfiler Analysis fisher hypergeometric ORA Pathway • 856 views

Login before adding your answer.

Traffic: 1400 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6