Question: Taxonomy information in nr database
gravatar for navela78
2.8 years ago by
navela7860 wrote:

How is taxonomy information injected into BLAST databases?

My application logic is requiring me to rebuild nr from the fasta file ( because I need to make some custom changes to the sequence headers:

In that file the headers does not seem to have taxonomy information other than the name of the taxonomy rank in brackets like this [Bacillus]. That doesn't seem to be enough to perform extractions using blasdbcmd like this

$ blastdbcmd -db nr -entry all -outfmt "%g %T" | \
   awk ' { if ($2 == 9606) { print $1 } } ' | \
   blastdbcmd -db nr -entry_batch - -out human_sequences.txt

There is an option called taxid_map in makeblastdb but where do I get the mapping file?

I guess a simpler way to ask my question is what command does NCBI use to make their nr database from the nr fasta file?

blast nr • 2.3k views
ADD COMMENTlink modified 2.8 years ago by Istvan Albert ♦♦ 86k • written 2.8 years ago by navela7860
gravatar for Istvan Albert
2.8 years ago by
Istvan Albert ♦♦ 86k
University Park, USA
Istvan Albert ♦♦ 86k wrote:

You have to have to download the taxonomy database as well. You can use the update script for that: taxdb --decompress

Then ensure that blast can automatically access that information as well:

export BLASTDB=$BLASTDB:~/location/of/taxdb
ADD COMMENTlink written 2.8 years ago by Istvan Albert ♦♦ 86k

Thanks for your suggestion. Your suggestion works for preformatted nr database downloaded from NCBI. However I need to make the nr database from scratch from the fasta file (because I need to add some information to sequence headers in the nr fasta file).

My question is similar to this one how to makeblastdb with taxon id's

Any thoughts on this?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by navela7860

I believe that as long as the accession numbers are the same you'd get the same behavior, hence you would not need to do anything in particular. My expectation is that the blast TaxDB is indexed by accession numbers.

If you also wanted to build your own custom taxonomy - then you'd have to build it with makeblastdb as you suspect. The mapping files are at:

ADD REPLYlink written 2.8 years ago by Istvan Albert ♦♦ 86k

Also, there's a lot things in here. What should i download to get the [-taxid TaxID] [-taxid_map TaxIDMapFile] needed for makeblastdb?

ADD REPLYlink written 13 months ago by bioinfool20

Hello Istvan,

I would also like to do the blast with taxon information as well using the swissprot database however i haven't found detailed instruction of doing it. What I did first is to download the taxonomy database as per instruction but i got error.

perl taxdb --decompress Connected to NCBI taxdb not found, skipping.

Can you help me with what should I do first because it seems like you know how to do this. Thank you!

ADD REPLYlink written 13 months ago by bioinfool20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2473 users visited in the last hour