Hello all,
I am trying to use R to create a table that links all KEGG orthology IDs to all related Entrez genes. In theory this can be done using the KEGGREST package from bioconductor.
I have a list of all the KEGG orthology IDs, which I want to convert to Entrez IDs using the function keggConv. First I try lapply
, but this is a problem because the url query is too long:
lapply(ko_nums,keggLink("genes",ko_nums))
Error in .get Url: (414) Request-URI Too Long
So that won't work with a query as big as mine. I tried to expand the list and query one at a time using:
output = apply(expand.grid(ko_nums),1,
function(x,y) keggLink("genes",x[1]))
But if you do this with a toy where
ko_nums = c("ko:K00001","ko:K00002","ko:K00003")
output = apply(expand.grid(ko_nums),1,
function(x,y) keggLink("genes",x[1]))
output
you see that my output is a list with many genes per orthology ID in a list. I want to keep each gene paired with its respective orthology number in a data table, BUT wrapping this in an "unlist" function removes all the ko identifiers, and I can't make a dataframe with the list as it is because each row would have a different number of elements.
Is there a way to make a dataframe from this list in which the ko numbers are split into individual orthology/gene pairs? Like this:
ko:K00001 gene_1
ko:k00001 gene_2
ko:K00001 gene_3
ko:K00002 gene_4
ko:K00002 gene_5
ko:K00002 gene_6
etc.
Thanks,
Maureen