How to download all Gene Ontology (GO) IDs with their associated vocabulary?
Entering edit mode
8 weeks ago
Jimmy • 0

I'm looking to get a file formatted with two columns. One column should contain the actual GO IDs. The second should contain the vocabulary associated with the GO ID that is in the same row. Rows in the file should look something like this:

GO:0000002  mitochondrial genome maintenance
GO:0000003  reproduction
GO:0000009  alpha-1,6-mannosyltransferase activity
GO:0000012  single strand break repair

So I want something like this, except for all GO IDs and their associated terms. How can I do this? I took a look at the downloads options from the GO website and did not see an option to download something like this.

gene-ontology GO • 634 views
Entering edit mode
8 weeks ago
seidel 11k

There are several ways to do this, but one is to use R and the GO.db library from Bioconductor.



# get keys for selection
go <- keys(GO.db, keytype="GOID")

# look at a few
select(GO.db, columns=c("GOID","TERM"), keys=go[1:3], keytype="GOID")

GOID                             TERM
1 GO:0000001        mitochondrion inheritance
2 GO:0000002 mitochondrial genome maintenance
3 GO:0000003                     reproduction

# get them all
df <- select(GO.db, columns=c("GOID","TERM"), keys=go, keytype="GOID")

[1] 43705     2
Entering edit mode

Thank you, this works for me. I have a follow-up question. I see some files with GO IDs separated into GO IDs for three categories: CC (cellular component), BP (biological process), and MF (molecular function). If I'm not mistaken, all GO terms fall within these three categories. Is there a way to create three separate dataframes, one containing GO IDs belonging to each of the three categories?

EDIT: I looked into it and if I'm not mistaken, what I'm asking for should conform to the categories in the Ontology column of the GO.db dataframe. So I guess I can just use that.

Entering edit mode

Yes. You can simply modify the select statement:

# select all IDs and Terms associated to Biological Process
df <- select(GO.db, columns=c("GOID","TERM"), keys="BP", keytype="ONTOLOGY")
Entering edit mode

Just thought I would come back here and add the exact script to generate tables of each ontology and save them all to file if anyone is interested in just copying and pasting it in now.


go <- keys(GO.db, keytype="GOID")

df <- select(GO.db, columns=c("GOID","TERM"), keys=go, keytype="GOID")

df_bp <- select(GO.db, columns=c("GOID","TERM"), keys="BP", keytype="ONTOLOGY")

df_cc <- select(GO.db, columns=c("GOID","TERM"), keys="MF", keytype="ONTOLOGY")

df_mf <- select(GO.db, columns=c("GOID","TERM"), keys="CC", keytype="ONTOLOGY")

write.table(df, "df_goterms.txt", sep = "\t", row.names = FALSE, quote = FALSE, col.names = FALSE)

write.table(df_bp[,2:3], "df_bp_goterms.txt", sep = "\t", row.names = FALSE, quote = FALSE, col.names = FALSE)

write.table(df_cc[,2:3], "df_cc_goterms.txt", sep = "\t", row.names = FALSE, quote = FALSE, col.names = FALSE)

write.table(df_mf[,2:3], "df_mf_goterms.txt", sep = "\t", row.names = FALSE, quote = FALSE, col.names = FALSE)
Entering edit mode
7 weeks ago

If anyone would like a less technical method, you can use an instance of InterMine and obtain this list in seconds. For example, use SGD's YeastMine, which lists the version of GO at the top of the YeastMine homepage (today, it says Data Updated on: Jan-31-2023; GO-Release: 2023-01-01). As the GO is not species-dependent, anyone can use this method for any organism.

Shortcut: Start on Step 3.

  1. Start at YeastMine,
  2. Under Templates in the purple tool bar (or click, select "GO Term name --> GO Term Identifier"
  3. From GO Term name --> GO Term Identifier, change the search to wildcard * and Show Results
  4. Verify you have a complete and current list; the GO version number and GO term count should match version and counts on

You can export this list as is, or further filter for aspect, etc.


Login before adding your answer.

Traffic: 1541 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6