Question: Taxonomy information in nr database
0
gravatar for navela78
11 months ago by
navela7850
navela7850 wrote:

How is taxonomy information injected into BLAST databases?

My application logic is requiring me to rebuild nr from the fasta file (ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/) because I need to make some custom changes to the sequence headers:

In that file the headers does not seem to have taxonomy information other than the name of the taxonomy rank in brackets like this [Bacillus]. That doesn't seem to be enough to perform extractions using blasdbcmd like this

$ blastdbcmd -db nr -entry all -outfmt "%g %T" | \
   awk ' { if ($2 == 9606) { print $1 } } ' | \
   blastdbcmd -db nr -entry_batch - -out human_sequences.txt

There is an option called taxid_map in makeblastdb but where do I get the mapping file?

I guess a simpler way to ask my question is what command does NCBI use to make their nr database from the nr fasta file?

blast nr • 925 views
ADD COMMENTlink modified 11 months ago by Istvan Albert ♦♦ 79k • written 11 months ago by navela7850
0
gravatar for Istvan Albert
11 months ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

You have to have to download the taxonomy database as well. You can use the update script for that:

update_blastdb.pl taxdb --decompress

Then ensure that blast can automatically access that information as well:

export BLASTDB=$BLASTDB:~/location/of/taxdb
ADD COMMENTlink written 11 months ago by Istvan Albert ♦♦ 79k

Thanks for your suggestion. Your suggestion works for preformatted nr database downloaded from NCBI. However I need to make the nr database from scratch from the fasta file (because I need to add some information to sequence headers in the nr fasta file).

My question is similar to this one how to makeblastdb with taxon id's

Any thoughts on this?

ADD REPLYlink modified 11 months ago • written 11 months ago by navela7850

I believe that as long as the accession numbers are the same you'd get the same behavior, hence you would not need to do anything in particular. My expectation is that the blast TaxDB is indexed by accession numbers.

If you also wanted to build your own custom taxonomy - then you'd have to build it with makeblastdb as you suspect. The mapping files are at:

ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/

ADD REPLYlink written 11 months ago by Istvan Albert ♦♦ 79k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1110 users visited in the last hour