Question: Expected Gene Ontology term frequency in a genome?
3.2 years ago by
kbrevik

Hello! I hope this isn't too obvious of a question.

I'm looking for just some basic ballpark estimates of GO term frequency in some "average" genomes, or some benchmarks done with some current assemblies. For example, GO:0000049 occurs 12 times in assembly x, GO:0013232 occurs 2000 times, etc. Of course these numbers are going to be very different based on methods and data and all that, but rough estimates is what I am looking for.

I'm working with some resequenced genomes, and I am just aiming to confirm that my estimates are consistent to rule out some programmatic issues. Thanks!

This is interesting, but not available...? It's usually the other way around, i.e. Use the gene frequency estimates to know whether a GO category is significant or not in an experiment. In short, you may have to write your own program to know what you want to know, and your post made me recall this blog post so I hope this is a good lead for you.

I can see potential issues with this approach:
- one is to account for the hierarchical nature of the ontology and the way genes are annotated, e.g. the same gene in two different genomes may have been annotated using different related terms e.g. the parent in one case and a child term in the other.
- second is to account for the use of different versions of the ontology. While obsolete terms can be mapped to new ones, new terms have obviously not been used in older annotations.

