Question: DAVID Functional Annotation Clustering Analysis
gravatar for parksuhong
4.3 years ago by
parksuhong0 wrote:

I am using DAVID's Functional Annotation Clustering analysis tool and I wonder how DAVID's algorithm test the null hypothesis that the enrichment of an annotation is purely by chance? Could anyone explain to me in simple way?

I am little confused because for example, 

One of the annotation cluster has only 7 genes AND the enrichment score is 1.23 with p-values of 2.8E-2:

    clustered terms are DNA-binding region:ETS (7 genes), Ets (7 genes), Domain:PNT(4 genes), ETS(7 genes), SAM PNT(4 genes)

But for another annotation cluster, there are 86 genes BUT the enrichment score is only 0.05 with p-values of 1.0E0:

Clustered terms are mitochondrial lumen (19 genes), mitochondrial matrix (19 genes), mitochondrion (86 genes), mitochondrial part (43 genes), mitochondrion (59 genes)

So higher number of overlapping genes in between each GOTERM doesn't necessarily means higher enrichment score and lower p-value? I am still confused to how the first annotation cluster above with only 7 genes overlap amongst GOTERMs has higher p-value than the second cluster where there are at least 19 genes overlapping amongst GOTERMs?


Thank you!



sequencing rna-seq chip-seq • 3.3k views
ADD COMMENTlink modified 4.3 years ago by Alternative240 • written 4.3 years ago by parksuhong0

This values make sense to me: the higher the enrichment score the better and consequently, for higher enrichment scores you will receive lower p-Values. Because the p-Values specify the likelihood of receiving the corresponding enrichment score by chance.

The enrichment score depends on the fold-change (or intensity values) of you genes and not on the overlap. Thus is makes sense that you are able to gain a higher enrichment score with few genes. But it is hard to tell without knowing you data...

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Manuel Landesfeind1.2k
gravatar for Alternative
4.3 years ago by
Alternative240 wrote:

This has most likely to do with your sample size (effect size). One should be very careful when small absolute number of genes is used in such analysis. roughly speaking, going from 1 to 2 is doubling by adding only 1. going from 10 to 20 is also doubling but by adding 10. This is not the same also you double in both. Maybe this will help:


ADD COMMENTlink written 4.3 years ago by Alternative240
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1662 users visited in the last hour