Pathway File Generation For Path-Scan Under Genome Music Using Keggrest Under R
1
2
Entering edit mode
10.8 years ago
DoubleD ▴ 130

Hello all,

First post, I've learned a lot from browsing through these forums.

I have been trying to make a pathway file for genome music, for the path-scan module. Ideally I'd like to make this out of the KEGG database information, and I have found a few posts about how to go about generating those files, and also about how the KEGG ftp site is now restricted. I have tried to use the KEGGREST package in R to retrieve the components I need, however it seems to be limited to 10 items at a time. I also grabbed the files available from GSEA, which seem to have the data I need to create my file. My questions is, why is this such a convoluted process if presumably most everyone running genome music is running it against a human genome and would like to know the KEGG pathways? I have spent a lot of time learning R, and I don't know perl, java or python. I'm trying to splice together multiples files into:

hsa00061 Fatty acid biosynthesis Lipid Metabolism 31:ACACA|32:ACACB|27349:MCAT|2194:FASN|54995:OXSM|55301:OLAH

Does anyone have suggestions on a better way to go about this?

Thank you, DD

genome music kegg • 3.7k views
ADD COMMENT
3
Entering edit mode
10.7 years ago

Hi DD. Sorry that it's so convoluted. PathScan was originally developed for use with 9 different databases listing pathways, gene-families, or protein-interactions. And we needed some sort of common file-format, which is what you're trying to generate.

Before KEGG restricted FTP access, we grabbed their giant tab-delimited file containing all pathway entries, and parsed through it using this Perl script. In that Github repository, you can find various pathway files formatted for use with PathScan. Here is a KEGG pathway file that you can use. Hope it helps.

ADD COMMENT
0
Entering edit mode

Do you have scripts available to do the conversion of the other PID pathways? Reactome, Biocarta? I've checked out your preprocessed files in Github. Are they the latest releases? Thanks for your help.

ADD REPLY
0
Entering edit mode

Sorry, there are no scripts. Except for KEGG data, many of the preprocessed files on github are from 2008! But you can use a bit of Perl to convert GMT (gene-set format) files down here. For example, to download and convert the Reactome GMT file:

wget http://www.broadinstitute.org/gsea/resources/msigdb/4.0/c2.cp.reactome.v4.0.symbols.gmt
perl -a -F'\t' -ne '$gs=join("|",map{s/^/0:/;$_}@F[2..$#F]); print join("\t",@F[0],".",".",$gs)' c2.cp.reactome.v4.0.symbols.gmt > reactome_pathway_file
ADD REPLY
0
Entering edit mode

great, the gmt file shoudl be way easier to parse. thanks so much for the quick reply!

best, w

ADD REPLY

Login before adding your answer.

Traffic: 3256 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6