Question: Gene Ontology Tool: David
gravatar for neumannmartin1987
5.7 years ago by
neumannmartin198710 wrote:

Hello friends,

I have a question concerning the gene ontology tool DAVID: In DAVID it is possbile to create a sublist after a first round of functional annotation. The sublist is created by the user himself, i.e. if I check the tickbox for the GO-Term Extracellular Matrix, I can create a sublist with that term (or several terms selected by that way). My term has p< 0.05 but corrected p-value (Benjamini) of 0.1 (not significant). If I choose now my sublist and perform a second round of analysis (background keeps the same) I get significant values (lets say p<0.000005 and Benjamini p<0.0001). In a third round of analysis by creating a 3rd sublist, with the same (!) terms, the values do not change! Do you think that this is correct? Can one perform that way of analysis? Is there a justification/explanaition for this?

Thank you very much,

kind regards!


gene ontology • 5.7k views
ADD COMMENTlink modified 4.7 years ago by Biostar ♦♦ 20 • written 5.7 years ago by neumannmartin198710
gravatar for David
5.7 years ago by
David720 wrote:

If I understand you correctly, you are having a gene list A on which you run an over-representation analysis (ORA) using DAVID then select the genes that fall in an arbitrary category and run this sub-gene list against DAVID...

In my opinion this is wrong. You are bound to obtain significant over-representation in a category if you select only the gene in this category. For your gene list to be unbiased you should not filter using knowledge from the categories you are trying to explore.

In a simplistic manner this is what you did: Imagine you have black and white balls in a urn. You draw a sample without replacement from the urn. You run on this sample an ORA and found no significant over-representation. You now keep only the black balls from the sample and run an ORA again with only the black balls. Surprise! now the black balls are significantly over-represented...

ADD COMMENTlink written 5.7 years ago by David720
gravatar for neumannmartin1987
5.7 years ago by
neumannmartin198710 wrote:

Hey David,

thank you for your answer. Yes, I think you're right! In the mean time I talked to a bioinformatician concerning my question, and he told me in principle the same.

But know I am wondering what is the function of a sublist??? It is not an important question, but just interesting, or not?

I have one question in addition: Do think that the Benjamini value (q-value, or corrected p-value) has to be <0.05 to be significant (like a "normal p-value"), right?

ADD COMMENTlink written 5.7 years ago by neumannmartin198710

hypergeometric test can definitively be tricky.

What are the function of sublists... The sublists represent features (e.g. genes) sharing a particular concept. The basic idea behind is that you are able to better understand your list of significantly regulated genes.

One can also do a focus analysis of the genes belonging to a particular concept. For example looking specifically at those genes across your different conditions using a heatmap coupled with a hierarchical clustering etc...

5% FDR is the consensus for the significance level. But p-value are not the only statistical measure of interest and are highly sensitive to sample size. To conclude this is only one step in your analysis it helps believing in your analysis but you will have to consider the context and think about what biological validation you should perform to make your point.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by David720
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1149 users visited in the last hour