Question: Download Pathways associated with species
1
gravatar for Phil S.
5.7 years ago by
Phil S.660
Stuttgart, Germany
Phil S.660 wrote:

Hi guys,

I got a metagenome sample and identified several species/families... in there. Now I'm thinking of automatically gathering pathways (just the names though) for each of those species. Preferably this should be done in R. Any Ideas of a Database / API where to download Pathway-Names given the species name?
 

Thanks,

 

Phil

 

to clarify, basically what I need is typing 'Gardnerella Vaginalis' into some KEGG (or any other) API and I retrieve the list of pathways, here just a snapshot

gvg00010             Glycolysis / Gluconeogenesis - Gardnerella vaginalis ATCC 14019 
gvg00030             Pentose phosphate pathway - Gardnerella vaginalis ATCC 14019 
gvg00040             Pentose and glucuronate interconversions - Gardnerella vaginalis ATCC 14019 
gvg00051             Fructose and mannose metabolism - Gardnerella vaginalis ATCC 14019 
gvg00052             Galactose metabolism - Gardnerella vaginalis ATCC 14019 
gvg00061             Fatty acid biosynthesis - Gardnerella vaginalis ATCC 14019 
gvg00071             Fatty acid degradation - Gardnerella vaginalis ATCC 14019 
gvg00072             Synthesis and degradation of ketone bodies - Gardnerella vaginalis ATCC 14019 
gvg00121             Secondary bile acid biosynthesis - Gardnerella vaginalis ATCC 14019 
crawling pathways metagenomes • 2.6k views
ADD COMMENTlink modified 5.7 years ago by Prakki Rama2.4k • written 5.7 years ago by Phil S.660
5
gravatar for umer.zeeshan.ijaz
5.7 years ago by
Glasgow, UK
umer.zeeshan.ijaz1.8k wrote:

ADD COMMENTlink modified 22 days ago by RamRS25k • written 5.7 years ago by umer.zeeshan.ijaz1.8k

Thanks, will have a look at it!

ADD REPLYlink modified 22 days ago by RamRS25k • written 5.7 years ago by Phil S.660

This definitely looks like a great solution, will try it. Thanks!

ADD REPLYlink written 5.7 years ago by CuteDriving6630

Ok so, after looking into it it really seems like the way I have to go... ;) Thanks for that, the problem is that I don't have any contigs or something else I just have the species name, that's all. Any Ideas how to handle that?

ADD REPLYlink written 5.7 years ago by Phil S.660

Well you can always download the annotated GBK file from NCBI for different species. Put them all in a folder, use cat *.gbk > test.gbk and then keeping your fingers crossed that you have annotated enzymes, you can follow the one-liners then onwards.

Read this post of mine.

ADD REPLYlink modified 22 days ago by RamRS25k • written 5.7 years ago by umer.zeeshan.ijaz1.8k

I guess this will be my work around if anything else fails. What I am thinking about at the moment is using the KEGGREST R API from Bioconductor. The only problem I got at the moment is that I'm not too sure about how to get the first list of pathways available at all. Because if I got those, I can just 'grep' for the T.genome numbers, download those and do what ever I want to them... This, might decrease traffic and therefore time...

I will keep you posted anyways what worked out best! If you have any other Idea let me know! But thanks for your help!!!!!!

ADD REPLYlink modified 22 days ago by RamRS25k • written 5.7 years ago by Phil S.660

Okay I spent last 20 mins thinking over this, and here is the solution ( I have to go somewhere now, I did it in a hurry so excuse a long one-liner, but I think I have done it right):

Step 1: Go to the website, and see which organisms/species are you interested in, and then get their T numbers (which is genome ID), and store them in IDs.txt

$ cat IDs.txt
T01329
T02919
T01060
T02994

Step 2: Now use the following to extract the pathways for your species/organism:

$ for i in $(cat IDs.txt); do echo $(curl -s http://rest.kegg.jp/link/pathway/genome:$i | grep -Po '(?<=path:).*') | awk '{gsub("[a-zA-Z]+","",$0);}1'| xargs -n 1 | xargs -I {} curl -s http://rest.kegg.jp/find/pathway/{} | awk -v k=$i '{print k"\t"$0}'  ; done
T01329    path:map00010    Glycolysis / Gluconeogenesis
T01329    path:map00020    Citrate cycle (TCA cycle)
T01329    path:map00030    Pentose phosphate pathway
T01329    path:map00040    Pentose and glucuronate interconversions
T01329    path:map00051    Fructose and mannose metabolism
T01329    path:map00052    Galactose metabolism
T01329    path:map00053    Ascorbate and aldarate metabolism
T01329    path:map00061    Fatty acid biosynthesis
T01329    path:map00062    Fatty acid elongation
T01329    path:map00071    Fatty acid degradation
T01329    path:map00072    Synthesis and degradation of ketone bodies
T01329    path:map00100    Steroid biosynthesis
T01329    path:map00120    Primary bile acid biosynthesis
T01329    path:map00130    Ubiquinone and other terpenoid-quinone biosynthesis
T01329    path:map00140    Steroid hormone biosynthesis
T01329    path:map00190    Oxidative phosphorylation
T01329    path:map00230    Purine metabolism
T01329    path:map00232    Caffeine metabolism
T01329    path:map00240    Pyrimidine metabolism
T01329    path:map00250    Alanine, aspartate and glutamate metabolism
T01329    path:map00260    Glycine, serine and threonine metabolism
T01329    path:map00270    Cysteine and methionine metabolism
T01329    path:map00280    Valine, leucine and isoleucine degradation
T01329    path:map00290    Valine, leucine and isoleucine biosynthesis
T01329    path:map00300    Lysine biosynthesis
T01329    path:map00310    Lysine degradation
ADD REPLYlink modified 22 days ago by RamRS25k • written 5.7 years ago by umer.zeeshan.ijaz1.8k

thank you so much for investing that much into it! I really appreciate it!!!!!!

ADD REPLYlink written 5.7 years ago by Phil S.660

hey,

apologize distrubing you again but why I am not able to redirect the output into a file instead having it on the terminal? Is there any special way needed to do that?

I was trying it with:

...| awk -v k=$i '{print k"\t"$0 > "./foo.txt"}'  ; done

or

...| awk -v k=$i '{print k"\t"$0}' ; done > ./foo.txt

but somehow this just creates the file but leaves it empty... (both times)

edit:

solved it. THANK YOU SO MUCH!!

ADD REPLYlink modified 22 days ago by RamRS25k • written 5.7 years ago by Phil S.660
0
gravatar for Pierre Lindenbaum
5.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum125k wrote:

The NCBI biosystems database contains a resource mapping the BSID to the taxon iD:

ftp://ftp.ncbi.nih.gov/pub/biosystems/CURRENT/biosystems_taxonomy.gz

ADD COMMENTlink modified 22 days ago by RamRS25k • written 5.7 years ago by Pierre Lindenbaum125k

This does not give me the pathways, does it?

ADD REPLYlink written 5.7 years ago by Phil S.660
0
gravatar for Prakki Rama
5.7 years ago by
Prakki Rama2.4k
Singapore
Prakki Rama2.4k wrote:

Using the KEGG REST, wrote the following PERL script. Is this what you exactly wanted?

open FH,"pathwayIds.txt";  ##list of organisms in KEGG Org code
while(<FH>)
{
`wget http://rest.kegg.jp/list/pathway/$_`;
}
close(FH);

INPUT: pathwayIds.txt

gva
gvg
gvh

If you just have one organism, then type the following in terminal

wget http://rest.kegg.jp/list/pathway/gvg
ADD COMMENTlink modified 22 days ago by RamRS25k • written 5.7 years ago by Prakki Rama2.4k

Thank you for adding the answers!

ADD REPLYlink modified 22 days ago by RamRS25k • written 5.7 years ago by Phil S.660
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 763 users visited in the last hour