Question: GO gene enrichment test in non model organism (bacteria)
gravatar for rwn
3.0 years ago by
United Kingdom
rwn400 wrote:


I have a small set (~100 or so) genes I have identified as being interesting in a bacterial species, and I'd like to do a simple gene enrichment analysis to see what functional categories may be overrepresented in the set. Tools for the job include Blast2Go, with which you can implement a Fisher's exact test, and GoEast, which uses other (perhaps better?) statistical tests such as the hypergeometric test. (If there are others I'd love to know about them also.)

However, these tools require a user-defined 'background' or 'reference' set of genes to be uploaded, and so my question is: how do you define the background gene set? Is it better to use the whole genome of the organism in question, such that the test becomes: given a genome full of genes with associated functions, what GO categories are overrepresented in the subset of genes I have highlighted? Or is it perhaps better to provide a randomly selected reference set of equivalent size, ie. randomly select ~100 genes to use as a background representation of 'random' functional diversity? Tips and common protocols for this type of analysis would be much appreciated!

Thanks in advance!

PS, I am aware there are a number of questions with similar titles already on Biostars, but I don't think any directly answer this question - apologies if I've missed something.

PPS, perhaps I should say my genes of interest genes are NOT identified based on differential expression experiments or suchlike, but I do have access to whole-genome data for my organism.

ADD COMMENTlink modified 3.0 years ago by Istvan Albert ♦♦ 73k • written 3.0 years ago by rwn400
gravatar for Istvan Albert
3.0 years ago by
Istvan Albert ♦♦ 73k
University Park, USA
Istvan Albert ♦♦ 73k wrote:

The underlying assumptions in most enrichment analyses are that most genes do NOT change. Thus it would not even matter if you used all genes or only genes that are not in your selection. I would say that there is no advantage in providing a smaller subset as it only decreases the power of the test (although that probably depends on the type of test).  

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Istvan Albert ♦♦ 73k

Thanks Istvan

ADD REPLYlink written 3.0 years ago by rwn400
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1477 users visited in the last hour