Question

Download Pathways associated with species

1

Entering edit mode

10.0 years ago

Phil S. ▴ 700

Hi guys,

I got a metagenome sample and identified several species/families... in there. Now I'm thinking of automatically gathering pathways (just the names though) for each of those species. Preferably this should be done in R. Any Ideas of a Database / API where to download Pathway-Names given the species name?

Thanks,
Phil

to clarify, basically what I need is typing 'Gardnerella Vaginalis' into some KEGG (or any other) API and I retrieve the list of pathways, here just a snapshot

gvg00010             Glycolysis / Gluconeogenesis - Gardnerella vaginalis ATCC 14019 
gvg00030             Pentose phosphate pathway - Gardnerella vaginalis ATCC 14019 
gvg00040             Pentose and glucuronate interconversions - Gardnerella vaginalis ATCC 14019 
gvg00051             Fructose and mannose metabolism - Gardnerella vaginalis ATCC 14019 
gvg00052             Galactose metabolism - Gardnerella vaginalis ATCC 14019 
gvg00061             Fatty acid biosynthesis - Gardnerella vaginalis ATCC 14019 
gvg00071             Fatty acid degradation - Gardnerella vaginalis ATCC 14019 
gvg00072             Synthesis and degradation of ketone bodies - Gardnerella vaginalis ATCC 14019 
gvg00121             Secondary bile acid biosynthesis - Gardnerella vaginalis ATCC 14019

pathways metagenomes crawling • 4.5k views

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 10.0 years ago by Phil S. ▴ 700

0

Entering edit mode

10.0 years ago

Pierre Lindenbaum 161k

The NCBI biosystems database contains a resource mapping the BSID to the taxon iD:

ftp://ftp.ncbi.nih.gov/pub/biosystems/CURRENT/biosystems_taxonomy.gz

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

This does not give me the pathways, does it?

ADD REPLY • link 10.0 years ago by Phil S. ▴ 700

0

Entering edit mode

10.0 years ago

Prakki Rama ★ 2.7k

Using the KEGG REST, wrote the following PERL script. Is this what you exactly wanted?

open FH,"pathwayIds.txt";  ##list of organisms in KEGG Org code
while(<FH>)
{
`wget http://rest.kegg.jp/list/pathway/$_`;
}
close(FH);

INPUT: pathwayIds.txt

gva
gvg
gvh

If you just have one organism, then type the following in terminal

wget http://rest.kegg.jp/list/pathway/gvg

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Prakki Rama ★ 2.7k

0

Entering edit mode

Thank you for adding the answers!

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Phil S. ▴ 700

Ram · Accepted Answer · 2014-04-25

5

Entering edit mode

10.0 years ago

umer.zeeshan.ijaz ★ 1.8k

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by umer.zeeshan.ijaz ★ 1.8k

0

Entering edit mode

Thanks, will have a look at it!

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Phil S. ▴ 700

0

Entering edit mode

This definitely looks like a great solution, will try it. Thanks!

ADD REPLY • link 10.0 years ago by CuteDriving663 • 0

0

Entering edit mode

Ok so, after looking into it it really seems like the way I have to go... ;) Thanks for that, the problem is that I don't have any contigs or something else I just have the species name, that's all. Any Ideas how to handle that?

ADD REPLY • link 10.0 years ago by Phil S. ▴ 700

0

Entering edit mode

Well you can always download the annotated GBK file from NCBI for different species. Put them all in a folder, use cat *.gbk > test.gbk and then keeping your fingers crossed that you have annotated enzymes, you can follow the one-liners then onwards.

Read this post of mine.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by umer.zeeshan.ijaz ★ 1.8k

0

Entering edit mode

I guess this will be my work around if anything else fails. What I am thinking about at the moment is using the KEGGREST R API from Bioconductor. The only problem I got at the moment is that I'm not too sure about how to get the first list of pathways available at all. Because if I got those, I can just 'grep' for the T.genome numbers, download those and do what ever I want to them... This, might decrease traffic and therefore time...

I will keep you posted anyways what worked out best! If you have any other Idea let me know! But thanks for your help!!!!!!

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Phil S. ▴ 700

0

Entering edit mode

Okay I spent last 20 mins thinking over this, and here is the solution ( I have to go somewhere now, I did it in a hurry so excuse a long one-liner, but I think I have done it right):

Step 1: Go to the website, and see which organisms/species are you interested in, and then get their T numbers (which is genome ID), and store them in IDs.txt

$ cat IDs.txt
T01329
T02919
T01060
T02994

Step 2: Now use the following to extract the pathways for your species/organism:

$ for i in $(cat IDs.txt); do echo $(curl -s http://rest.kegg.jp/link/pathway/genome:$i | grep -Po '(?<=path:).*') | awk '{gsub("[a-zA-Z]+","",$0);}1'| xargs -n 1 | xargs -I {} curl -s http://rest.kegg.jp/find/pathway/{} | awk -v k=$i '{print k"\t"$0}'  ; done
T01329    path:map00010    Glycolysis / Gluconeogenesis
T01329    path:map00020    Citrate cycle (TCA cycle)
T01329    path:map00030    Pentose phosphate pathway
T01329    path:map00040    Pentose and glucuronate interconversions
T01329    path:map00051    Fructose and mannose metabolism
T01329    path:map00052    Galactose metabolism
T01329    path:map00053    Ascorbate and aldarate metabolism
T01329    path:map00061    Fatty acid biosynthesis
T01329    path:map00062    Fatty acid elongation
T01329    path:map00071    Fatty acid degradation
T01329    path:map00072    Synthesis and degradation of ketone bodies
T01329    path:map00100    Steroid biosynthesis
T01329    path:map00120    Primary bile acid biosynthesis
T01329    path:map00130    Ubiquinone and other terpenoid-quinone biosynthesis
T01329    path:map00140    Steroid hormone biosynthesis
T01329    path:map00190    Oxidative phosphorylation
T01329    path:map00230    Purine metabolism
T01329    path:map00232    Caffeine metabolism
T01329    path:map00240    Pyrimidine metabolism
T01329    path:map00250    Alanine, aspartate and glutamate metabolism
T01329    path:map00260    Glycine, serine and threonine metabolism
T01329    path:map00270    Cysteine and methionine metabolism
T01329    path:map00280    Valine, leucine and isoleucine degradation
T01329    path:map00290    Valine, leucine and isoleucine biosynthesis
T01329    path:map00300    Lysine biosynthesis
T01329    path:map00310    Lysine degradation

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by umer.zeeshan.ijaz ★ 1.8k

0

Entering edit mode

thank you so much for investing that much into it! I really appreciate it!!!!!!

ADD REPLY • link 10.0 years ago by Phil S. ▴ 700

0

Entering edit mode

hey,

apologize distrubing you again but why I am not able to redirect the output into a file instead having it on the terminal? Is there any special way needed to do that?

I was trying it with:

...| awk -v k=$i '{print k"\t"$0 > "./foo.txt"}'  ; done

or

...| awk -v k=$i '{print k"\t"$0}' ; done > ./foo.txt

but somehow this just creates the file but leaves it empty... (both times)

edit:

solved it. THANK YOU SO MUCH!!

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Phil S. ▴ 700