Hello all,

I am trying to use R to create a table that links all KEGG orthology IDs to all related Entrez genes. In theory this can be done using the KEGGREST package from bioconductor. 

I have a list of all the KEGG orthology IDs, which I want to convert to Entrez IDs using the function keggConv. First I try lapply, but this is a problem because the url query is too long:

Error in .get Url: (414) Request-URI Too Long

So that won't work with a query as big as mine. I tried to expand the list and query one at a time using:

     output = apply(expand.grid(ko_nums),1,
                  function(x,y) keggLink("genes",x[1]))

But if you do this with a toy where

     ko_nums = c("ko:K00001","ko:K00002","ko:K00003")
     output = apply(expand.grid(ko_nums),1,
                  function(x,y) keggLink("genes",x[1]))

you see that my output is a list with many genes per orthology ID in a list. I want to keep each gene paired with its respective orthology number in a data table, BUT wrapping this in an "unlist" function removes all the ko identifiers, and I can't make a dataframe with the list as it is because each row would have a different number of elements.

Is there a way to make a dataframe from this list in which the ko numbers are split into individual orthology/gene pairs? Like this:     

     ko:K00001     gene_1
​     ko:k00001      gene_2
     ko:K00001     gene_3
     ko:K00002     gene_4
     ko:K00002     gene_5
     ko:K00002     gene_6




