blastdbcmd error with -target_only option
2
0
Entering edit mode
13 months ago
biomarco ▴ 50

Hi all,

I'm occasionally encountering the following error when retrieving sequences with blastdbcmd using the -target_only option:

(base) 16:34:24 marco@blast:~$ blastdbcmd -entry WBM69675.1 -target_only -db nr
Error: [blastdbcmd] Error: oid headers do not contain target gi/seq_id.

Without -target_only everything works fine:

(base) 16:34:28 marco@blast:~$ blastdbcmd -entry WBM69675.1 -db nr
>WP_183271186.1 MULTISPECIES: lysine decarboxylase CadA [unclassified Buttiauxella] >WBM69675.1 lysine decarboxylase CadA [Buttiauxella sp. WJP83] >GDX05976.1 lysine decarboxylase CadA [Buttiauxella sp. A111]
MNVIAIMNHMGVYFKEEPIRELHRALERLDFRIVYPNDREDLLKLIENNARLCGVIFDWDKYNLELCEEISKCNEYMPLY
AFANTYSTLDVSLNDLRLQVRFFEYALGAAEDIANKIKQNTDEYIDTILPPLTKALFKYVREGKYTFCTPGHMGGTAFQK
SPVGSIFYDFFGSNTMKSDISISVSELGSLLDHSGPHKEAEEYIARVFNAERSYMVTNGTSTANKIVGMYSAPAGSTVLI
DRNCHKSLTHLMMMSNITPIYFRPTRNAYGILGGIPQSEFQRATIAKRVKETPNATWPVHAVITNSTYDGLLYNTDFIKK
TLDVKSIHFDSAWVPYTNFSPIYAGKCGMSGGRVEGKVIYETQSTHKLLAAFSQASMIHVKGDINEETFNEAYMMHTTTS
PHYGVVASTETAAAMMKGNSGKRLIDGSIERSIKFRKEIKRLKGESEGWFFDVWQPEHIDGAECWPLRSDSAWHGFKNID
NEHMYLDPIKVTMLTPGMKKDGTMDEFGIPASIVSKYLDEHGIIVEKTGPYNLLFLFSIGIDKTKALSLLRALTDFKRSF
DLNLRVKNMLPSLYREDPEFYENMRIQELAQNIHKLIAHHNLPDLMFRAFEVLPSMMVTPFVAFQKELHGQTEEVYLDEM
VGRVNANMILPYPPGVPLVMPGEMITEESRPVLEFLQMLCEIGAHYPGFETDIHGAYRQADGRYTVKVLKEENNK

I recently updated my local nr database, I'm wondering whether it's corrupted since I'm pretty sure I've never seen this error before. The most annoying thing for me is that when using -entry_batch along with -target_only the program is terminated whenever this error occurs, so the problematic entry is not just skipped and the whole thing dies.

Can someone try to reproduce the problem and let me know if I have a broken nr database or something? Many thanks!

UPDATE! This error occurs with Gene Bank entries starting with W, just like the following ones: WAH52037.1 WBL74272.1 WDB51475.1 WDB43112.1 WCZ02214.1 WCP79122.1 WAG26413.1 WAH53327.1

I reported this problem to NCBI, will keep this post updated.

blastdbcmd • 1.1k views
ADD COMMENT
1
Entering edit mode
8 months ago
biomarco ▴ 50

An update for the unfortunate ones that may encounter the same in the future:

After an email exchange with the NCBI support, we could not solve this problem, even reinstalling blastp or the nr database did not do the job.

After a few months, for different reasons, I ran a full system upgrade of my Ubuntu 22.04 and reinstalled the anaconda environment. This unexpectedly solved the problem once and for all, can't tell what was causing it, but now it works. Blastp was reinstalled from the bioconda channel.

ADD COMMENT
0
Entering edit mode
13 months ago
GenoMax 141k

let me know if I have a broken nr database or something?

Likely not. You may be aware that WP* entries are special and refer to multiple species. As a result they will always have more than one identifier in the name ( as you can see above, which even breaks the "normal" fasta format). So it appears that -target_only option will not work with those entries.

ADD COMMENT
0
Entering edit mode

-target_only is meant to write just 1 out of many possible headers for the target sequence. For instance, in the example below I successfully use it for a WP* entry. Note that in the first case, with -target_only I obtain only 1 header.

(base) 17:06:07 marco@blast:~/Marco/decarboxylase/test$ blastdbcmd -db nr -entry EFH4178262.1 -target_only
>EFH4178262.1 lysine decarboxylase CadA [Escherichia coli]
MNVIAILNHMGVYFKEEPIRELHRALERLNFQIVYPNDRDDLLKLIENNARLCGVIFDWDKYNLELCEEISKMNENLPLY
AFANTYSTLDVSLNDLRLQISFFEYALGAAEDIANKIKQTTDEYINTILPPLTKALFKYVREGKYTFCTPGHMGGTAYQK
SPVGSLFYDFFGPNTMKSDISISVSELGSLLDHSGPHKEAEQYIARVFNADRSYMVTNGTSTANKIVGMYSAPAGSTILI
DRNCHKSLTHLMMMSDVTPIYFRPTRNAYGILGGIPQSEFQHATIAKRVKETPNATWPVHAVITNSTYDGLLYNTDFIKK
TLDVKSIHFDSAWVPYTNFSPIYEGKCGMSGGRVEGKVIYETQSTHKLLAAFSQASMIHVKGDVNEETFNEAYMMHTTTS
PHYGIVASTETAAAMMKGNAGKRLINGSIERAIKFRKEIKRLRTESDGWFFDVWQPDHIDTTECWPLRSDSTWHGFKNID
NEHMYLDPIKVTLLTPGMEKDGTMSDFGIPASIVAKYLDEHGIVVEKTGPYNLLFLFSIGIDKTKALSLLRALTDFKRAF
DLNLRVKNMLPSLYREDPEFYENMRIQELAQNIHKLIVHHNLPDLMYRAFEVLPTMVMTPYAAFQKELHGMTEEVYLDEM
VGRINANMILPYPPGVPLVMPGEMITEESRPVLEFLQMLCEIGAHYPGFETDIHGAYRQADGRYTVKVLKEESKK
(base) 17:07:11 marco@blast:~/Marco/decarboxylase/test$ blastdbcmd -db nr -entry EFH4178262.1
>WP_089570917.1 MULTISPECIES: lysine decarboxylase CadA [Escherichia] >EFH4178262.1 lysine decarboxylase CadA [Escherichia coli] >EHS7041063.1 lysine decarboxylase CadA [Escherichia coli] >EJJ8489595.1 lysine decarboxylase CadA [Escherichia coli] >MCN1087580.1 lysine decarboxylase CadA [Escherichia coli] >MQS19346.1 lysine decarboxylase CadA [Escherichia coli]
MNVIAILNHMGVYFKEEPIRELHRALERLNFQIVYPNDRDDLLKLIENNARLCGVIFDWDKYNLELCEEISKMNENLPLY
AFANTYSTLDVSLNDLRLQISFFEYALGAAEDIANKIKQTTDEYINTILPPLTKALFKYVREGKYTFCTPGHMGGTAYQK
SPVGSLFYDFFGPNTMKSDISISVSELGSLLDHSGPHKEAEQYIARVFNADRSYMVTNGTSTANKIVGMYSAPAGSTILI
DRNCHKSLTHLMMMSDVTPIYFRPTRNAYGILGGIPQSEFQHATIAKRVKETPNATWPVHAVITNSTYDGLLYNTDFIKK
TLDVKSIHFDSAWVPYTNFSPIYEGKCGMSGGRVEGKVIYETQSTHKLLAAFSQASMIHVKGDVNEETFNEAYMMHTTTS
PHYGIVASTETAAAMMKGNAGKRLINGSIERAIKFRKEIKRLRTESDGWFFDVWQPDHIDTTECWPLRSDSTWHGFKNID
NEHMYLDPIKVTLLTPGMEKDGTMSDFGIPASIVAKYLDEHGIVVEKTGPYNLLFLFSIGIDKTKALSLLRALTDFKRAF
DLNLRVKNMLPSLYREDPEFYENMRIQELAQNIHKLIVHHNLPDLMYRAFEVLPTMVMTPYAAFQKELHGMTEEVYLDEM
VGRINANMILPYPPGVPLVMPGEMITEESRPVLEFLQMLCEIGAHYPGFETDIHGAYRQADGRYTVKVLKEESKK
ADD REPLY
1
Entering edit mode

If this is not working on some entries then this may be a question best sent in to NCBI help desk with examples.

ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6