Hi, I would like to learn more about Gene Ontologies and the especially programmatic manipulation of ontologies. I have used GO for classifying genes in the past, but I have never really understood the structure of the ontologies. Any suggestions would be appreciated. Thanks, Ethan
I haven't worked with computational ontologies for years but I might be able to give you some pointers to help you out. An ontology is just someone's specification of a particular domain - their world view. What you can say about the world or how you describe the world is limited by the computational languages available and the constructs within them for representing information. In other words your view of the world isn't a true world view- you are limited by the expressivity of the language. this used to be called the interaction problem in the old days but I'm sure they have a new name for it now.
So that said, people write biological ontologies in the different languages available and these languages will have different constructs for modelling information. If you are a programmer, a database design written in EER could be seen as on ontology, or UML diagrams are ontologies. They are a specification of a conceptualiztion after all! But common ontology languages have notions like is-a and has-a constructs e.g. a GPCR receptior IS-A protein which HAS-A transmembrance domain. There are also things like description logics and OWL and probably tonnes of other 'mark up' languages for representing the struture of information.
Programmatic manipulation of an ontology depends entirely on the specific ontology and the language it is written in. In other words the programmatic constructs for accessing the ontology will depend entirely on the inherent structure of the ontology. Most of them are ultimately big hierarchies like a traditional OO inheritance hierarchy
So that really is an answer to your question about the structure of ontologies. I can't be more specific about programmatic access without a specific ontology to talk about. If you look on the GO website there is a dizzying array of ontology tools. You might be better to ask some smaller specific questions about a specific ontology you don't understand.
In simple terms understanding structure of Gene ontology is all about comprehending the Schema of particular Ontology database. Schema of database gives us clear view of how information are stored, tables, fields and their relationship.
In above figure you can see some parts of database where there are tables, fields, attributes, relationship and along with primary ID. Now programming with the GeneOntology is all about knowing how information are stored and way of querying it in Ontology database.
MySql connection parameters for the GO database mirror at the EBI
Parameter Value host mysql.ebi.ac.uk user go_select password amigo database go_latest port 4085
Similarly you can manipulate ontology with perl, python, R, Ruby or whatever language you know if you go through documentation.
I strongly believe R is the best choice for programatic manipulation of GO, there are several Bioconductor packages like topGO, GOstats. Further excellent tutorial at Blue Collar Bioinformatics, R & Bioconductor Manual, GO related libraries and more teaching resources
Hope it helps
Edit - Previous Schema image updated!
Please have a look here: