Question

Pathway File Generation For Path-Scan Under Genome Music Using Keggrest Under R

2

Entering edit mode

10.8 years ago

DoubleD ▴ 130

Hello all,

First post, I've learned a lot from browsing through these forums.

I have been trying to make a pathway file for genome music, for the path-scan module. Ideally I'd like to make this out of the KEGG database information, and I have found a few posts about how to go about generating those files, and also about how the KEGG ftp site is now restricted. I have tried to use the KEGGREST package in R to retrieve the components I need, however it seems to be limited to 10 items at a time. I also grabbed the files available from GSEA, which seem to have the data I need to create my file. My questions is, why is this such a convoluted process if presumably most everyone running genome music is running it against a human genome and would like to know the KEGG pathways? I have spent a lot of time learning R, and I don't know perl, java or python. I'm trying to splice together multiples files into:

Does anyone have suggestions on a better way to go about this?

Thank you, DD

genome music kegg • 3.7k views

ADD COMMENT • link updated 10.2 years ago by Biostar 20 • written 10.8 years ago by DoubleD ▴ 130

score 3 · Answer 1 · 2013-08-05

3

Entering edit mode

10.7 years ago

Cyriac Kandoth 6.0k

Hi DD. Sorry that it's so convoluted. PathScan was originally developed for use with 9 different databases listing pathways, gene-families, or protein-interactions. And we needed some sort of common file-format, which is what you're trying to generate.

Before KEGG restricted FTP access, we grabbed their giant tab-delimited file containing all pathway entries, and parsed through it using this Perl script. In that Github repository, you can find various pathway files formatted for use with PathScan. Here is a KEGG pathway file that you can use. Hope it helps.

ADD COMMENT • link 10.6 years ago by Cyriac Kandoth 6.0k

0

Entering edit mode

Do you have scripts available to do the conversion of the other PID pathways? Reactome, Biocarta? I've checked out your preprocessed files in Github. Are they the latest releases? Thanks for your help.

ADD REPLY • link 10.5 years ago by wliao • 0

0

Entering edit mode

Sorry, there are no scripts. Except for KEGG data, many of the preprocessed files on github are from 2008! But you can use a bit of Perl to convert GMT (gene-set format) files down here. For example, to download and convert the Reactome GMT file:

wget http://www.broadinstitute.org/gsea/resources/msigdb/4.0/c2.cp.reactome.v4.0.symbols.gmt
perl -a -F'\t' -ne '$gs=join("|",map{s/^/0:/;$_}@F[2..$#F]); print join("\t",@F[0],".",".",$gs)' c2.cp.reactome.v4.0.symbols.gmt > reactome_pathway_file

ADD REPLY • link 10.5 years ago by Cyriac Kandoth 6.0k

0

Entering edit mode

great, the gmt file shoudl be way easier to parse. thanks so much for the quick reply!

best, w

ADD REPLY • link 10.5 years ago by wliao • 0