Question: Can someone give insight into the difference between current gene-function databases?
I am currently working a bioinformatics internship, and I'm fairly new to all of this. Right now, I need to learn about the different databases and interface with them, using gene sets as the input to these databases.

I've noticed that the ones I'm looking at, GO and KEGG, give out different results--while GO has the three different categories (biological processes, molecular functions, cellular components), KEGG appears to give out results based on pathways? It makes me sort of nervous to get results back from GO and then get only one significant result from KEGG.

I know GO and KEGG are not the only databases out there, so I'm interested:

Can someone give a brief summary over the nature of some of the major databases' "output?"

GO stands for Gene Ontology and as the name suggests, it annotates genes using an ontology. KEGG, Panther and other "pathway" databases group genes into "pathways" which are basically lists of genes participating in the same biological process. Ontology annotations give more flexibility and capture better (in my view) the complexity of gene functional relationships. On the other hand, gene lists are easier to understand and manipulate but what goes into a given list can be subjective and two databases will not agree on the level of granularity for describing a given process. For example, one database would give a big list of genes involved in DNA replication while another would break it up into origin-licensing, DNA polymerisation ...


