Is there a way to make the NCBI entrez or datasets API respect biomial nomenclature within requests?
1
0
Entering edit mode
15 months ago
rijan_dhakal ▴ 10

TL;DR: NCBI API does not respect species name when giving the whole species name.

I am trying to go through the NCBI databases looking for genomic data on specific species. I am using NCBI entrez Direct. I am using the following one liner:

esearch -db assembly -query "{insert species name here}" | esummary | xtract -pattern DocumentSummary -element SpeciesName

An example run with an actual species name:

esearch -db assembly -query "Aristolochia cretica" | esummary | xtract -pattern DocumentSummary -element SpeciesName

Produces the output

Aristolochia contorta
Aristolochia fimbriata

This is an issue for me because I requested "Aristolochia cretica" and got "Aristolochia contorta" and "Aristolochia fimbriata".

Second example:

esearch -db assembly -query "Physaria geyeri" | esummary | xtract -pattern DocumentSummary -element SpeciesName

Output:

Physaria ovalifolia
Physaria fendleri
Physaria acutifolia

third example:

esearch -db assembly -query "Vicia serinica" | esummary | xtract -pattern DocumentSummary -element SpeciesName

Vicia sativa
Vicia sativa
Vicia faba

I want the API to be mindful of the specific name as well. I am following this official manaul and I might have missed something but I am not being able to find a way to account for this behaviour.

Is writing something myself the only solution? Or is there a flag or setting I am missing?

NCBI Entrez • 478 views
ADD COMMENT
3
Entering edit mode
15 months ago
GenoMax 142k

NCBI API does not respect species name when giving the whole species name.

Not correct.

Or is there a flag or setting I am missing?

Add [ORGN] to scientific name.

You are looking for an organism that does not seem to have any data in assembly database. So if you do

$ esearch -db assembly -query "Aristolochia cretica [ORGN]" | esummary | xtract -pattern DocumentSummary -element SpeciesName

you get nothing but doing

$ esearch -db assembly -query "Aristolochia contorta [ORGN]" | esummary | xtract -pattern DocumentSummary -element SpeciesName
Aristolochia contorta

gets the correct output.

Taxonomy database does not seem to know about

$ esearch -db taxonomy -query "Aristolochia cretica [ORGN]"  | esummary 

but if you look for

$ esearch -db taxonomy -query "Aristolochia contorta [ORGN]"  | esummary | xtract -pattern DocumentSummary -element ScientificName
Aristolochia contorta

You get the correct output.

Looks like only these two species have genomes in Aristolochia genus: https://www.ncbi.nlm.nih.gov/genome/?term=Aristolochia

ADD COMMENT

Login before adding your answer.

Traffic: 1458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6