How to download all Gene Ontology (GO) IDs with their associated vocabulary?
3
2
Entering edit mode
14 months ago
Jimmy ▴ 30

I'm looking to get a file formatted with two columns. One column should contain the actual GO IDs. The second should contain the vocabulary associated with the GO ID that is in the same row. Rows in the file should look something like this:

GO:0000002  mitochondrial genome maintenance
GO:0000003  reproduction
GO:0000009  alpha-1,6-mannosyltransferase activity
GO:0000012  single strand break repair

So I want something like this, except for all GO IDs and their associated terms. How can I do this? I took a look at the downloads options from the GO website and did not see an option to download something like this.

gene-ontology GO • 3.2k views
ADD COMMENT
2
Entering edit mode
14 months ago
seidel 11k

There are several ways to do this, but one is to use R and the GO.db library from Bioconductor.

library(GO.db)

columns(GO.db)
[1] "DEFINITION" "GOID"       "ONTOLOGY"   "TERM"

# get keys for selection
go <- keys(GO.db, keytype="GOID")

# look at a few
select(GO.db, columns=c("GOID","TERM"), keys=go[1:3], keytype="GOID")

GOID                             TERM
1 GO:0000001        mitochondrion inheritance
2 GO:0000002 mitochondrial genome maintenance
3 GO:0000003                     reproduction

# get them all
df <- select(GO.db, columns=c("GOID","TERM"), keys=go, keytype="GOID")

dim(df)
[1] 43705     2
ADD COMMENT
0
Entering edit mode

Thank you, this works for me. I have a follow-up question. I see some files with GO IDs separated into GO IDs for three categories: CC (cellular component), BP (biological process), and MF (molecular function). If I'm not mistaken, all GO terms fall within these three categories. Is there a way to create three separate dataframes, one containing GO IDs belonging to each of the three categories?

EDIT: I looked into it and if I'm not mistaken, what I'm asking for should conform to the categories in the Ontology column of the GO.db dataframe. So I guess I can just use that.

ADD REPLY
1
Entering edit mode

Yes. You can simply modify the select statement:

# select all IDs and Terms associated to Biological Process
df <- select(GO.db, columns=c("GOID","TERM"), keys="BP", keytype="ONTOLOGY")
ADD REPLY
0
Entering edit mode

Just thought I would come back here and add the exact script to generate tables of each ontology and save them all to file if anyone is interested in just copying and pasting it in now.

suppressMessages(library(GO.db))

go <- keys(GO.db, keytype="GOID")

df <- select(GO.db, columns=c("GOID","TERM"), keys=go, keytype="GOID")

df_bp <- select(GO.db, columns=c("GOID","TERM"), keys="BP", keytype="ONTOLOGY")

df_cc <- select(GO.db, columns=c("GOID","TERM"), keys="MF", keytype="ONTOLOGY")

df_mf <- select(GO.db, columns=c("GOID","TERM"), keys="CC", keytype="ONTOLOGY")

write.table(df, "df_goterms.txt", sep = "\t", row.names = FALSE, quote = FALSE, col.names = FALSE)

write.table(df_bp[,2:3], "df_bp_goterms.txt", sep = "\t", row.names = FALSE, quote = FALSE, col.names = FALSE)

write.table(df_cc[,2:3], "df_cc_goterms.txt", sep = "\t", row.names = FALSE, quote = FALSE, col.names = FALSE)

write.table(df_mf[,2:3], "df_mf_goterms.txt", sep = "\t", row.names = FALSE, quote = FALSE, col.names = FALSE)
ADD REPLY
0
Entering edit mode
14 months ago

If anyone would like a less technical method, you can use an instance of InterMine and obtain this list in seconds. For example, use SGD's YeastMine, which lists the version of GO at the top of the YeastMine homepage (today, it says Data Updated on: Jan-31-2023; GO-Release: 2023-01-01). As the GO is not species-dependent, anyone can use this method for any organism.

Shortcut: Start on Step 3.

  1. Start at YeastMine, https://yeastmine.yeastgenome.org/yeastmine/begin.do
  2. Under Templates in the purple tool bar (or click https://yeastmine.yeastgenome.org/yeastmine/templates.do), select "GO Term name --> GO Term Identifier"
  3. From GO Term name --> GO Term Identifier, change the search to wildcard * and Show Results
  4. Verify you have a complete and current list; the GO version number and GO term count should match version and counts on http://geneontology.org/stats.html

You can export this list as is, or further filter for aspect, etc.

ADD COMMENT
0
Entering edit mode
4 months ago

The GO term mappings are defined in the "go-basic.obo" file. More details are available at the Download the ontology page on the geneontology.org site.

For R, the ontologyIndex package works well for parsing this file, as an alternative to GO.db annotation package.

For Python, the obonet package works well for parsing OBO files.

ADD COMMENT

Login before adding your answer.

Traffic: 1569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6