Issues with enricher() function from clusterProfiler
0
0
Entering edit mode
3 months ago

Hi there,

I am using the enricher() function from clusterProfiler to perform a KEGG pathway enrichment analysis. I work on a non-model species, although with a sequenced and annotated genome.

I have a list of 179 unique genes differentially expressed annotated with KOs (e.g. K08448) and a background (all the genes in the genome) of 8873 unique KOs. I created, by retrieving the necessary information with KEGGREST, a TERM2GENE table where the KEGG pathways are mapped to the KOs in order to perform a "custom" enrichment. The TERM2GENE table looks (hereafter named as "mapping_table") like this:

 Map      KO
map0001  K009567
map0001  K876540
map0003  K286021


Then, used the enricher() function as follows:

enricher(gene = differentially_expressed_KOs, TERM2GENE= mapping_table)


the input (differentially_expressed_KOs) was the simple list of DEGs and looked like this:

K009567
K007213
K023784
....


Here reported the first line of the output

ID         GeneRatio BgRatio pvalue
map01412   5/91      84/8873 0.001658081


However, a few points rose up from the output:

1. The denominator of the gene ratio consists of 91 genes instead of 179, which according to the clusterProfiler manual should be instead the size of the list of genes of interest (defined as n).
2. By trying of sorting this out and better understanding where the mistake was, I attempted to manually calculate the values using the phyper function in R as follows: phyper(5,84,8873-84,91, lower.tail = F) = 0.0002108303 where 5 is the number of DEGs in that pathway, 84 is the number of genes annotated in that pathway, 8873-84 is the total number of annotated genes (universe) minus those in that pathway and 91 is the size of my genes of interest (DEGs). Instead, by using the actual number of DEGs phyper(5,84,8873-84,179, lower.tail = F) = 0.006868312 the result is much closer to the output of enricher, though still different.
3. Lastly, by specifying the option universe= in the function enricher and using all the KOs from the genome, the background changes radically and consider only 3856 genes as background.

Am I missing something here?

clusterProfiler enrichment • 200 views