Question: Programmatically retrieving taxon classification
0
gravatar for schlogl
12 days ago by
schlogl70
Brazil-Florianopolis
schlogl70 wrote:

Hi there, hope everyone healthy and save. Do you guys know some API or another way I can retrieve taxon classification from https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=28048&lvl=3&lin=f&keep=1&srchmode=1&unlock?

I have a list of name as:

Acidiphilium
Acidipropionibacterium
Acidithiobacillus
Acidobacterium
Acidovorax
Acinetobacter
Actinoalloteichus
Actinobacillus
Actinomadura
Actinomyces

And I would like to have something like this as return:

Bacteria; Terrabacteria group; Actinobacteria; Actinobacteria; Acidothermales; Acidothermaceae

Thanks to your time.

Paulo

PS - I don't wanna do it manually.

sequence • 104 views
ADD COMMENTlink modified 12 days ago by hugo.avila180 • written 12 days ago by schlogl70
2
gravatar for colindaven
12 days ago by
colindaven2.6k
Hannover Medical School
colindaven2.6k wrote:

Have a look at https://github.com/shenwei356/taxonkit , I think it's exactly what you need

ADD COMMENTlink written 12 days ago by colindaven2.6k
2
gravatar for GenoMax
12 days ago by
GenoMax95k
United States
GenoMax95k wrote:

Using EntrezDirect:

Your list in file id. One per line.

$ for i in `cat id`; do printf ${i}"\n"; esearch -db taxonomy -query ${i} | efetch -format native -mode xml | grep ScientificName | awk -F ">|<" 'BEGIN{ORS=", ";}{print $3;}'; printf "\n"; done
Acidiphilium
Acidiphilium, cellular organisms, Bacteria, Proteobacteria, Alphaproteobacteria, Rhodospirillales, Acetobacteraceae,
Acidipropionibacterium
Acidipropionibacterium, cellular organisms, Bacteria, Terrabacteria group, Actinobacteria, Actinobacteria, Propionibacteriales, Propionibacteriaceae,
Acidithiobacillus
Acidithiobacillus, cellular organisms, Bacteria, Proteobacteria, Acidithiobacillia, Acidithiobacillales, Acidithiobacillaceae,
Acidobacterium
Acidobacterium, cellular organisms, Bacteria, Acidobacteria, Acidobacteriia, Acidobacteriales, Acidobacteriaceae,
Acidovorax
Acidovorax, cellular organisms, Bacteria, Proteobacteria, Betaproteobacteria, Burkholderiales, Comamonadaceae,
Acinetobacter
Acinetobacter, cellular organisms, Bacteria, Proteobacteria, Gammaproteobacteria, Pseudomonadales, Moraxellaceae,
Actinoalloteichus
Actinoalloteichus, cellular organisms, Bacteria, Terrabacteria group, Actinobacteria, Actinobacteria, Pseudonocardiales, Pseudonocardiaceae,
Actinobacillus
Actinobacillus, cellular organisms, Bacteria, Proteobacteria, Gammaproteobacteria, Pasteurellales, Pasteurellaceae,
Actinomadura
Actinomadura, cellular organisms, Bacteria, Terrabacteria group, Actinobacteria, Actinobacteria, Streptosporangiales, Thermomonosporaceae,
ADD COMMENTlink modified 12 days ago • written 12 days ago by GenoMax95k

Thanks Genomax. Awesome.

ADD REPLYlink written 12 days ago by schlogl70
2
gravatar for hugo.avila
12 days ago by
hugo.avila180
hugo.avila180 wrote:

This should do the trick.

I did use this other answer, added a loop and a little string format. The file your_list.txt contains your list of names.

Here it go:

cat your_list.txt | 
    xargs -I {} sh -c "esearch -db taxonomy -query '{}' | efetch -db taxonomy -format docsum | xtract -pattern DocumentSummary  -element TaxId | head -1" | 
    xargs -I {} sh -c "esearch -db taxonomy  -query \"{}[TaxId]\" | 
        efetch -format native -mode xml | 
        grep ScientificName | grep -Po '(?<=\>).+(?=\<)' | tr '\n' ';' | sed -r 's/cellular organisms;//;s/;$/\n/'"

E um salve pra vc ;)

ADD COMMENTlink written 12 days ago by hugo.avila180

Valeu brother. Bom ver uns Brasileiros por aqui. Me manda teu contato. schlogl@hotmail.com Thanks

ADD REPLYlink written 12 days ago by schlogl70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2186 users visited in the last hour
_