Question: Gene Ontology Tool: David
5.0 years ago
neumannmartin198710 wrote:

Hello friends,

I have a question concerning the gene ontology tool DAVID: In DAVID it is possbile to create a sublist after a first round of functional annotation. The sublist is created by the user himself, i.e. if I check the tickbox for the GO-Term Extracellular Matrix, I can create a sublist with that term (or several terms selected by that way). My term has p< 0.05 but corrected p-value (Benjamini) of 0.1 (not significant). If I choose now my sublist and perform a second round of analysis (background keeps the same) I get significant values (lets say p<0.000005 and Benjamini p<0.0001). In a third round of analysis by creating a 3rd sublist, with the same (!) terms, the values do not change! Do you think that this is correct? Can one perform that way of analysis? Is there a justification/explanaition for this?

Thank you very much,

kind regards!


gene ontology • 5.2k views
gene ontology • 5.2k views
5.0 years ago
David720 wrote:

If I understand you correctly, you are having a gene list A on which you run an over-representation analysis (ORA) using DAVID then select the genes that fall in an arbitrary category and run this sub-gene list against DAVID...

In my opinion this is wrong. You are bound to obtain significant over-representation in a category if you select only the gene in this category. For your gene list to be unbiased you should not filter using knowledge from the categories you are trying to explore.

In a simplistic manner this is what you did: Imagine you have black and white balls in a urn. You draw a sample without replacement from the urn. You run on this sample an ORA and found no significant over-representation. You now keep only the black balls from the sample and run an ORA again with only the black balls. Surprise! now the black balls are significantly over-represented...

5.0 years ago by David720
5.0 years ago
neumannmartin198710 wrote:

Hey David,

thank you for your answer. Yes, I think you're right! In the mean time I talked to a bioinformatician concerning my question, and he told me in principle the same.

But know I am wondering what is the function of a sublist??? It is not an important question, but just interesting, or not?

I have one question in addition: Do think that the Benjamini value (q-value, or corrected p-value) has to be <0.05 to be significant (like a "normal p-value"), right?

neumannmartin198710

hypergeometric test can definitively be tricky.

What are the function of sublists... The sublists represent features (e.g. genes) sharing a particular concept. The basic idea behind is that you are able to better understand your list of significantly regulated genes.

One can also do a focus analysis of the genes belonging to a particular concept. For example looking specifically at those genes across your different conditions using a heatmap coupled with a hierarchical clustering etc...

5% FDR is the consensus for the significance level. But p-value are not the only statistical measure of interest and are highly sensitive to sample size. To conclude this is only one step in your analysis it helps believing in your analysis but you will have to consider the context and think about what biological validation you should perform to make your point.

David720
