I am using DAVID's Functional Annotation Clustering analysis tool and I wonder how DAVID's algorithm test the null hypothesis that the enrichment of an annotation is purely by chance? Could anyone explain to me in simple way?
I am little confused because for example,
One of the annotation cluster has only 7 genes AND the enrichment score is 1.23 with p-values of 2.8E-2:
clustered terms are DNA-binding region:ETS (7 genes), Ets (7 genes), Domain:PNT(4 genes), ETS(7 genes), SAM PNT(4 genes)
But for another annotation cluster, there are 86 genes BUT the enrichment score is only 0.05 with p-values of 1.0E0:
Clustered terms are mitochondrial lumen (19 genes), mitochondrial matrix (19 genes), mitochondrion (86 genes), mitochondrial part (43 genes), mitochondrion (59 genes)
So higher number of overlapping genes in between each GOTERM doesn't necessarily means higher enrichment score and lower p-value? I am still confused to how the first annotation cluster above with only 7 genes overlap amongst GOTERMs has higher p-value than the second cluster where there are at least 19 genes overlapping amongst GOTERMs?