Question: blastdbcmd taxonomy id conflict
0
gravatar for Chirag Parsania
14 months ago by
Chirag Parsania1.4k
University of Macau
Chirag Parsania1.4k wrote:

Hi, I am using 'blastdbcmd ' command to map taxonomy ids and other details to the local blast outcomes. For the query protein “WP_071944094.1” I found taxonomy id conflict between what I get locally and online record. See the command and outcome given below for your reference.

Local command :

blastdbcmd -db <path to nr> -outfmt '%a %g %T %t' -out "temp.txt" -entry WP_071944094.1

outcome :

WP_071944094.1 1110718287 1480675 glyoxalase [Halolamina sediminis]
APE31202.1 1108539552 1480675 glyoxalase [Halolamina sediminis]

According to this the given protein belongs to species Halolamina sediminis(tax id : 1480675). However, online record of WP_071944094.1 shows that protein belongs to Halomonas aestuarii (tax id : 1897729).

Can anyone please explain why there is conflict between online result and local outcome. ?

blastp localblast taxonomyid • 560 views
ADD COMMENTlink modified 14 months ago by lieven.sterck4.5k • written 14 months ago by Chirag Parsania1.4k

We can assume you're using the same database version locally as the one presented online at NCBI ? ( == your local DBs are up-to-date)?

ADD REPLYlink written 14 months ago by lieven.sterck4.5k
1
gravatar for lieven.sterck
14 months ago by
lieven.sterck4.5k
VIB, Ghent, Belgium
lieven.sterck4.5k wrote:

This seems to be a non-redundant RefSeq protein entry, hence the difference . You can find some additional info here :RefSeq non-redundant proteins

ADD COMMENTlink written 14 months ago by lieven.sterck4.5k

Hi, I believe you said it rightly. I found below paragraph form the link you shared, which explains the cause for the question I asked.

A non-redundant protein record that provides organism information at the level of a genus, family, or even super-kingdom does not mean that the protein is found in all RefSeq genomes below that taxonomic classification. It only indicates that the protein is found in more than one genome of different species for which the genus, family, or super-kingdom classification is the lowest common taxonomic node.

Which also means that the taxonomy id which I found locally is more specific compare to online record. Correct me if I am wrong.

Anyway, thanks a lot.

ADD REPLYlink written 14 months ago by Chirag Parsania1.4k

Yes, that could explain. However, I'm starting to think there might be something else going on :-/ . Did you check the DB version (local vs online) ? and there is also no parsing issue in your output? Perhaps these entries have been updated recently? Here is the blastdbcmd output from a DB version from sep 2017 (I'm trying to trace the discrepancy) :

$ blastdbcmd -db /blastdb/shared/prot -entry WP_071944094.1 -outfmt '%a %g %T %t'
WP_071944094.1 1110718287 1897729 glyoxalase [Halomonas aestuarii]
APE31202.1 1108539552 1897729 glyoxalase [Halomonas aestuarii]
$ blastdbcmd -db /blastdb/shared/prot -entry WP_053947656.1 -outfmt '%a %g %T %t'
WP_053947656.1 928922717 1480675 glyoxalase [Halolamina sediminis]

As you can see it gives the same output as the online query

ADD REPLYlink modified 14 months ago • written 14 months ago by lieven.sterck4.5k

Hi Sterk, you have changed the input id from WP_071944094.1 to WP_053947656.1. There is no issue with WP_053947656.1.

I ran the same command with my database version (perhaps mine is little older than you. Downloaded in mid 2017). It gave me same output like yours.

WP_053947656.1 928922717 1480675 glyoxalase [Halolamina sediminis]
ADD REPLYlink written 14 months ago by Chirag Parsania1.4k
1

To confirm (and perhaps close the issue) I just came to download the latest version of the nr DB from NCBI. I tried it again and the output confirms my previous reply:

$ blastdbcmd -db /blastdb/shared/prot -entry WP_071944094.1 -outfmt '%a %g %T %t'
WP_071944094.1 1110718287 1897729 glyoxalase [Halomonas aestuarii]
APE31202.1 1108539552 1897729 glyoxalase [Halomonas aestuarii]

I see thus no discrepancy when querying this ID locally.

My best guess is that your DB was outdated and that this particular ID might have been revised in the meantime.

ADD REPLYlink written 14 months ago by lieven.sterck4.5k
1

Hi lieven, Thanks for this. By the way got same reply from NCBI. See below


**You may want to update the metadata file by redownload the taxdb.tar.gz file from our ftp site as well as update your nr or refseq_protein database.

Working with updated files, I have:

blastdbcmd -db nr -entry WP_071944094.1 -outfmt "%a %T %S"
WP_071944094.1 1897729 Halomonas sp. Hb3
APE31202.1 1897729 Halomonas sp. Hb3

This matches the web record's link to taxonomy: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1897729**

ADD REPLYlink modified 14 months ago • written 14 months ago by Chirag Parsania1.4k

Yes, I know. i just wanted to indicate I get two different outputs. The main point is that I get the same result for WP_071944094.1 when I do that on my local DB as you get with the online search. == there is thus no discrepancy in my trial between local and online querying this ID.

ADD REPLYlink modified 14 months ago • written 14 months ago by lieven.sterck4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1516 users visited in the last hour