Question

syntax questions about getting sequences from KO numbers using KEGGREST

1

Entering edit mode

5.7 years ago

jon.sy.tarn ▴ 10

I suppose my question is along the same veins of previous posts such as these:

download KEGG genes sequence in fasta format

Basically what I want to do is feed a list of KO numbers from kegg into a program, and get the resulting amino acid or nucleotide fastas from each of these KO numbers.

Based on what I've already read, I need to be using KEGGREST.

However, I'm having some trouble deciphering the syntax.

This is the usage provided for me via keggget on the API manual:

keggGet(dbentries, option = c("aaseq", "ntseq", "mol", "kcf", "image", "kgml"))

and this is an example they show:

str(res)
res <- keggGet(c("hsa:10458", "ece:Z5100"), "aaseq") ## retrieves amino
## acid sequences of a human gene and an
## E.coli O157 gene

my question is: how do I decipher this? Am I to assume that I can enter a KO number in place of the aca or hsa numbers shown above?

sorry for the potentially basic question.

KEGG • 1.0k views

ADD COMMENT • link updated 5.7 years ago by Mark ★ 1.5k • written 5.7 years ago by jon.sy.tarn ▴ 10

score 0 · Answer 1 · 2018-08-13

Yes that's correct. It might be helpful to explicity define what each option is doing to illustrate how the function is operating:

res <- keggGet(option = "aaseq", dbentries = c("hsa:10458", "ece:Z5100"))

option selects the database to search and dbentries is the ID of the entries you want to retrieve. It will return a list, which you can subset using the $ notation. You will then have to use the package biostrings to manipulate the sequences.

If you have a list of say 100 IDs you want queries, you can automate the process like this:

my_list <- c("hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100", 
             "hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100",
             "hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100",
             "hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100")

split_my_list <- split(my_list, 1:4)
results <- lapply(X = split_my_list, FUN = keggGet, option = "aaseq")

I've copied the same entries over and over again just for illustration purposes. Using the function split I've split the list in 4 chunks (change 1:4 to 1:10 or whatever you want). Then I use lapply to apply keggGet on the split list.

It will return a list of lists, so subset like this results$1$results$'1'$blahblah.