Question

R: download all KEGG pathways including KO and Compounds

0

Entering edit mode

3.8 years ago

dago ★ 2.8k

I saw this question has been asked here and there before. However, I could not find a tool that does the job for me.

I want to download all pathways from KEGG including KO and compounds using R. I would imagine creating an R object like:

$Path_1
...KO
...Compounds
$Path_2
...KO
...Compounds
$Path_3
...KO
...Compounds

Any idea how to download the data?

Thank you

R KEGG System_biology • 4.1k views

ADD COMMENT • link updated 3.8 years ago by ATpoint 81k • written 3.8 years ago by dago ★ 2.8k

1

Entering edit mode

all pathways from KEGG including KO and compounds using R.

That would violate their AUP if you don't have a license.

ADD REPLY • link 3.8 years ago by GenoMax 141k

0

Entering edit mode

I did not think about this. I guess I an getting used to have open source tools/db. Thanks

ADD REPLY • link 3.8 years ago by dago ★ 2.8k

2

Entering edit mode

3.8 years ago

ATpoint 81k

MSigDB contains the KEGG pathways: https://www.gsea-msigdb.org/gsea/msigdb/collections.jsp Download the gmt file and then load it into R, e.g. with

kegg <- fgsea::gmtPathways("c2.cp.kegg.v7.1.symbols.gmt")

> head(kegg)
$KEGG_GLYCOLYSIS_GLUCONEOGENESIS
 [1] "ACSS2"   "GCK"     "PGK2"    "PGK1"    "PDHB"    "PDHA1"   "PDHA2"   "PGM2"   
 [9] "TPI1"    "ACSS1"   "FBP1"    "ADH1B"   "HK2"     "ADH1C"   "HK1"     "HK3"    
[17] "ADH4"    "PGAM2"   "ADH5"    "PGAM1"   "ADH1A"   "ALDOC"   "ALDH7A1" "LDHAL6B"
[25] "PKLR"    "LDHAL6A" "ENO1"    "PKM"     "PFKP"    "BPGM"    "PCK2"    "PCK1"   
[33] "ALDH1B1" "ALDH2"   "ALDH3A1" "AKR1A1"  "FBP2"    "PFKM"    "PFKL"    "LDHC"   
[41] "GAPDH"   "ENO3"    "ENO2"    "PGAM4"   "ADH7"    "ADH6"    "LDHB"    "ALDH1A3"
[49] "ALDH3B1" "ALDH3B2" "ALDH9A1" "ALDH3A2" "GALM"    "ALDOA"   "DLD"     "DLAT"   
[57] "ALDOB"   "G6PC2"   "LDHA"    "G6PC"    "PGM1"    "GPI"    

$KEGG_CITRATE_CYCLE_TCA_CYCLE
 [1] "IDH3B"    "DLST"     "PCK2"     "CS"       "PDHB"     "PCK1"     "PDHA1"   
 [8] "PDHA2"    "SUCLG2P2" "FH"       "SDHD"     "OGDH"     "SDHB"     "IDH3A"   
[15] "SDHC"     "IDH2"     "IDH1"     "ACO1"     "ACLY"     "MDH2"     "DLD"     
[22] "MDH1"     "DLAT"     "OGDHL"    "PC"       "SDHA"     "SUCLG1"   "SUCLA2"  
[29] "SUCLG2"   "IDH3G"    "ACO2"    

$KEGG_PENTOSE_PHOSPHATE_PATHWAY
 [1] "RPE"     "RPIA"    "PGM2"    "PGLS"    "PRPS2"   "FBP2"    "PFKM"    "PFKL"   
 [9] "TALDO1"  "TKT"     "FBP1"    "TKTL2"   "PGD"     "RBKS"    "ALDOA"   "ALDOC"  
[17] "ALDOB"   "H6PD"    "RPEL1"   "PRPS1L1" "PRPS1"   "DERA"    "G6PD"    "PGM1"   
[25] "TKTL1"   "PFKP"    "GPI"

ADD COMMENT • link 3.8 years ago by ATpoint 81k

0

Entering edit mode

That is actually great! But I am not sure there are compounds here, just name of genes. No?

ADD REPLY • link 3.8 years ago by dago ★ 2.8k

0

Entering edit mode

not the best solution because is for single organisms, but genome scale metabolic models (http://bigg.ucsd.edu/data_access) have all the information you need regarding the Gene-Protein-Reaction associations. Once you have the gene id, getting the KO with eggNOG shoudl not be a problem.

ADD REPLY • link 3.8 years ago by andres.firrincieli 3.6k

score 2 · Accepted Answer · 2020-06-19

2

Entering edit mode

3.8 years ago

5heikki 11k

You can use their API. However, it is not meant for downloading the entire database. For that there is the ftp which is behind a license