My goal is to generate a GO-terms netwerk in cytoscape containing all "biological processes" relevant for human cells. I am aware of the cytoscape gene ontology plugins like BinGO, Pingo and others but I think they all don't fit my requirements.
To get the network in Cytoscape, I need to get the GO-data from somewhere. My approach is to download a table of gene ontology terms (from homo sapiens, if that is possible :-S ) with their name and gene ontology accessions (GO:[0-9]+). Additionally I would like to download a table showing the relations between each go-term.
Though their might be a better/easier way to do this, I was trying to query the "go" database from ucsc. This database includes the tables "term" and "term2term". I believe these tables include the information I am looking for:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D go -e 'select * from term limit 5'
+----+-----------------------+--------------+----------------+-------------+---------+
| id | name | term_type | acc | is_obsolete | is_root |
+----+-----------------------+--------------+----------------+-------------+---------+
| 1 | all | universal | all | 0 | 1 |
| 2 | is_a | relationship | is_a | 0 | 0 |
| 3 | Candida GO slim | subset | goslim_candida | 0 | 0 |
| 4 | Generic GO slim | subset | goslim_generic | 0 | 0 |
| 5 | GOA and proteome slim | subset | goslim_goa | 0 | 0 |
+----+-----------------------+--------------+----------------+-------------+---------+
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D go -e 'select * from term2term limit 5'
+----+----------------------+----------+----------+----------+
| id | relationship_type_id | term1_id | term2_id | complete |
+----+----------------------+----------+----------+----------+
| 1 | 2 | 17 | 16 | 0 |
| 2 | 2 | 17 | 20 | 0 |
| 3 | 2 | 34 | 33 | 0 |
| 4 | 2 | 17 | 50 | 0 |
| 5 | 2 | 17 | 57 | 0 |
+----+----------------------+----------+----------+----------+
Next to these 2 tables there is also a species table:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D go -e 'select * from species limit 5'
+----+--------------+-------------+----------------+----------------+----------------------+-----------+------------+-------------+----------------+
| id | ncbi_taxa_id | common_name | lineage_string | genus | species | parent_id | left_value | right_value | taxonomic_rank |
+----+--------------+-------------+----------------+----------------+----------------------+-----------+------------+-------------+----------------+
| 1 | 127219 | NULL | NULL | Tellervini | | NULL | NULL | NULL | NULL |
| 2 | 58017 | NULL | NULL | Hirondellea | | NULL | NULL | NULL | NULL |
| 3 | 44919 | NULL | NULL | unidentified | soil organism R6-143 | NULL | NULL | NULL | NULL |
| 4 | 342596 | NULL | NULL | unclassified | Cytomegalovirus | NULL | NULL | NULL | NULL |
| 5 | 55548 | NULL | NULL | Dermatemydidae | | NULL | NULL | NULL | NULL |
+----+--------------+-------------+----------------+----------------+----------------------+-----------+------------+-------------+----------------+
However, I don't see how the species table can be linked to the GO-terms.
My question is, how can I download go terms that:
- have term_type=biological_process
- Are derived from / relate to human
And how to subset the term2term table by the GO-terms found in the previous query?
did you see : http://wiki.geneontology.org/index.php/Example_Queries and http://www.geneontology.org/GO.database.schema.shtml ?
Nope, thanks for the links!