(Sorry if this was bumped; needed to rename go_db to godb since the underscore created downstream problems for python's setuptools.)
godb is a Gene Ontology library for Python that contains a set of annotation maps describing most of the Gene Ontology.
It downloads, parses and exposes the Gene Ontology data in dataframes.
Note that the github version might not be stable; download using
pip install godb.
You'll get the annotation table with
import godb anno = godb.get_annotations() anno.head(3) GO id Ontology Term Synonym Definition GO:0000001 BP mitochondrion inheritance mitochondrial inheritance The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton. GO:0000002 BP mitochondrial genome maintenance The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome. GO:0000003 BP reproduction reproductive physiological process The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms. len(anno) # 41688
If there are multiple synonyms, these are separated with
;. While this makes the data untidy, it avoids having to include an arbitrary number of columns, many of which would be empty (for most rows).
Get maps of parents and children
With the functions
get_offspring you get a two column map showing the parents of each child and all the ancestors of each child, respectively.
cc_children = godb.get_children("CC") cc_children.head(3) # Child Parent Relation # 0 GO:0000015 GO:0044445 is_a # 1 GO:0000015 GO:1902494 is_a # 2 GO:0000109 GO:0044428 is_a len(cc_children) # 5511 cc_offspring = godb.get_offspring("CC") cc_offspring.head(3) # Offspring Parent # 0 GO:0000015 GO:0044445 # 0 GO:0000110 GO:0044428 # 1 GO:0000111 GO:0044428 len(cc_offspring) # 30658
Both get_offspring and get_children take the argument
relations, which is
["is_a", "part_of", "has_part"] by default. If you want to ignore certain relations when computing children or offspring, change this argument. R's GO.db uses the relationships
["is_a", "part_of"] to compute ancestors, so use these to get identical behavior.
get_offspring("CC", ["is_a", "part_of"]).head(3) # Offspring Parent # 0 GO:0000015 GO:0044445 # 1 GO:0000110 GO:0044428 # 2 GO:0000111 GO:0044428
Note that the first time a
godb function is used, the gene ontology datafile will be downloaded and this may take some time. If you want to display a warning message, you need to set the logging level to
import logging logging.basicConfig(level=logging.INFO)
pip install godb
pandas, both of which are automatically installed when using pip to install godb.
- (Possibly) Expose a command line interface similar to that of
biomartian. Do not use
GOenough to warrant it yet, though.
Report bugs, ask questions or request features at the issues page.
How do I get the genes associated with a term?
biomartian -d rnorvegicus_gene_ensembl -i external_gene_name -o go_id | shuf -n 10 Lpcat1 GO:0005509 Klb GO:0005975 LOC498555 GO:0003735 Map3k12 GO:0046777 Hoxb1 GO:0045944 Cir1 GO:0006397 Rhoc GO:0005525 Casr GO:0060613 Cib1 GO:1900026 Onecut1 GO:0002064
See biomartian for more info.