How To Extract Go Terms From A Given Kegg Id
10.2 years ago
Rm 8.2k

How to use biomart to link KEGG pathway ID to GO terms?

biomart kegg go • 6.8k views
10.2 years ago
Neilfws 49k

I don't think this is possible using most web-based implementations of BioMart, since the underlying database does not contain KEGG identifiers.

The closest I can find to what you want is this file, mapping KEGG reaction IDs to GO terms.

Thanks @Neilfws: May be I need to link using genes gene->GO ; Gene->KEGG ; then extrapolate KEGG to GO

10.2 years ago
Joachim ★ 2.9k

You can get GO terms that are linked to KEGG pathways via the KEGG API.

This Ruby script, go.rb, uses BioRuby to extract GO term(s):

require 'bio'

# Read in pathway ID from the command line:
pathway_id = ARGV[0]

# Connect to the public KEGG API server:
server = Bio::KEGG::API.new

# Retrieve a single pathway:
pathway_sheet = server.get_entries(["PATHWAY:#{pathway_id}"])

# Turn the textual representation into a Ruby object:
pathway = Bio::KEGG::PATHWAY::new(pathway_sheet)

# Check if there is a DB link to GO:
# Print each GO term on a separate line:
puts "GO:#{term}"
}
end


You can use this script on the command line as follows:

$ruby go.rb hsa04020 GO:0019722$ ruby go.rb hsa04210
GO:0006915
...


This will give you the GO term(s) that are linked to pathway hsa04020.

Hope that helps.

UPDATE:

An R solution using KEGGSOAP of Bioconductor.

# For installing Bioconductor and the KEGGSOAP package, run:
# source("http://bioconductor.org/biocLite.R")
# biocLite("KEGGSOAP")

library(KEGGSOAP)

# Get the textual representation got the pathway:
# (For now, there is no function like get.genes.by.pathway for getting dblinks.)
pathway <- bget("PATHWAY:hsa04020")

# Split the very long textual description into individual lines:
pathway.lines <- unlist(strsplit(pathway, '\n'))

# Create an empty vector for storing GO terms of the pathway:
pathway.go.terms <- c()

# Create a variable that is set to TRUE when we are processing the DBLINKS section:

# Go through the pathway description line-by-line:
for (n in 1:length(pathway.lines)) {
# If we are in the DBLINKS section, figure out when we leave it again:
if (in.dblinks == TRUE && !(substring(pathway.lines[n], 1, 1) == " "))

# When we see the beginning of the DBLINKS section, jot this down:

# If we are in the DBLINKS section, then look out for GO terms and save them:
if (in.dblinks == TRUE && substring(substring(pathway.lines[n], 13), 1, 3) == "GO:")
pathway.go.terms <- append(pathway.go.terms, substring(pathway.lines[n], 13))
}

# The GO terms of the pathway are now accumulated in the vector pathway.go.terms.

Thanks @Joachim: Any R alternative?

Well, there is always: go <- system2("ruby", "go.rb hsa04020", stdout=TRUE)

I tried similarly as described here : http://www.r-bloggers.com/calling-ruby-perl-or-python-from-r/ : in windows I need to install Ruby and all....

I updated my answer with an R solution. Big thanks to Neil for pointing out KEGGSOAP. Too bad that a get.dblinks.by.pathway function has not been implemented yet though.

0
Entering edit mode

@Joachim: +1 ; Thanks for the R update tooo. I also appreciate the your "commenting" the code step by step.

R/Bioconductor has multiple KEGG-related packages: http://bioconductor.org/help/search/index.html?q=kegg. KEGGSOAP may do what you want.

Thanks @Neilfws: I will give it a try...

4.9 years ago
daveshire ▴ 10

I know this is a dead thread, but I wanted to do roughly the same thing as the first poster and found that KEGG's linkDB system works pretty well. It was easy to pull up a list of all KO : GO term matches and it looks like there are various other mappings that it can be used for but I haven't tried them all.

3.6 years ago

Via transitivity; GO <-> Orthology (KO terms), Orthology <-> PubmedID, PubmedID <-> Pathway; KEGG API/ LinkDB allows for structuring a many-many linkage map between GO and Pathway terms that isn't directly available (although marked 'routed' on the official page). This has to be an explicit effort.

P.S. Contrarily, I do argue the veracity of this metric. A GO ID is indicative of a gene, while KEGG ID that of a pathway. By doing the above, we are throwing away quite a lot of background information by representing a pathway merely by a gene.