Hello all,
First post, I've learned a lot from browsing through these forums.
I have been trying to make a pathway file for genome music, for the path-scan module. Ideally I'd like to make this out of the KEGG database information, and I have found a few posts about how to go about generating those files, and also about how the KEGG ftp site is now restricted. I have tried to use the KEGGREST package in R to retrieve the components I need, however it seems to be limited to 10 items at a time. I also grabbed the files available from GSEA, which seem to have the data I need to create my file. My questions is, why is this such a convoluted process if presumably most everyone running genome music is running it against a human genome and would like to know the KEGG pathways? I have spent a lot of time learning R, and I don't know perl, java or python. I'm trying to splice together multiples files into:
hsa00061 Fatty acid biosynthesis Lipid Metabolism 31:ACACA|32:ACACB|27349:MCAT|2194:FASN|54995:OXSM|55301:OLAH
Does anyone have suggestions on a better way to go about this?
Thank you, DD
Do you have scripts available to do the conversion of the other PID pathways? Reactome, Biocarta? I've checked out your preprocessed files in Github. Are they the latest releases? Thanks for your help.
Sorry, there are no scripts. Except for KEGG data, many of the preprocessed files on github are from 2008! But you can use a bit of Perl to convert GMT (gene-set format) files down here. For example, to download and convert the Reactome GMT file:
great, the gmt file shoudl be way easier to parse. thanks so much for the quick reply!
best, w