Question: Restricting ncbi nr database: from accession numbers to database. Problem with blastdbcmd: strange fasta headers and incomplete output
0
gravatar for Janne.Swaegers
3.0 years ago by
Janne.Swaegers0 wrote:

Hi everyone,

I want to make a blast database of insect proteins to locally blast my transcriptome assembly. I dowloaded all the accession numbers associated with insects from the ncbi website. Next, I used this command to retrieve the associated fasta files from my locally installed nr ncbi database.

blastdbcmd -db /home/db/ncbi/nr -entry_batch protein_result.txt -out insects_seq.fa

This however gives me incomplete output - a lot of accession numbers were not found: e.g. Error: CAB42201.1: OID not found

Moreover, I get a lot of multi headers entries in the output file: e.g.

>gi|1080121958|gb|AOW70003.1| arginine kinase, partial [Remella rita] >gi|1080122062|gb|AOW70055.1| arginine kinase, partial [Xenophanes tryxus]
EEKVSSTLSGLEGELKGTFYPLTGMSKQTQQQLIDDHFLFKEGDRFLQAANACRFWPTGRGIYHNENKTFLVWCNEEDHL
RLISMQMGGDLKTVYKRLVTAVNDIEKRIPFSHNDRLGFLTFCPTNLGTTVRASVHIKLPKLAADKAKLEEVASKYHLQV
RGTRGEHTEAEGGVYDISNKRRMGLTEYDAVKEMYDG

Is there a way to avoid both issues?

Thanks a lot in advance! Janne

ADD COMMENTlink modified 2.9 years ago by blanca10 • written 3.0 years ago by Janne.Swaegers0

I can reproduce the second example posted above (with blast+, v.2.5.0) and can recover the same sequence entry using either of those accession numbers independently with blastdbcmd.

Edit: Examining those two individual entries (at NCBI) confirms that the sequences for those are identical. So NCBI is perhaps saving space by including both headers and a single copy of the sequence? That seems to be only logical explanation.

Edit 2: Having two headers like that in a single entry is going to further mess up FASTA format.

You may want to confirm by emailing BLAST support.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by genomax73k

Hi Janne,

Have you solved this issue?

ADD REPLYlink written 3.0 years ago by blanca10
0
gravatar for blanca
2.9 years ago by
blanca10
Spain
blanca10 wrote:

It seems to be solved in this other post: [solved] Retrieve fasta from balst db using blastdbcmd: Error: gi|742519789: OID not found

ADD COMMENTlink written 2.9 years ago by blanca10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2871 users visited in the last hour