Taxid will not function with other databases (including custom)
1
0
Entering edit mode
2.9 years ago
emilyc ▴ 30

Hello/Bonjour

I cannot get taxid to work with nr, or my custom database (the custom database does work); I cannot get the output to include the results for staxids.

Error: Warning: [blastx] Taxonomy name lookup from taxid requires installation of taxdb database with ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz

The taxdb.btd and taxdb.bti are both in my BLASTDB dir

Code:

  blastx -query SPAdes/contigs.fasta -db ../../BLASTDB/nr -outfmt "6 qseqid sseqid pident qlen length mismatch gapope evalue bitscore staxids sscinames" -num_threads 24 -out D446_S2_viral_fraction_nr_taxadb_test.blastx -max_target_seqs 20


Any help is appreciated!

blast blast+ linux • 1.8k views
0
Entering edit mode

Did you set the BLASTDB environment variable?

Scientific Names In Blast Output And Databases

0
Entering edit mode

It was set when I installed Blast originally, do I need to do it again/another way now that I have added the 2 taxdb files to the same dir?

0
Entering edit mode

What is the result of:

echo $BLASTDB  and: ls -lh$BLASTDB

0
Entering edit mode
echo $BLASTDB  :/home/emily/blast/bin/ ls -lh$BLASTDB


Is all the files in my home dir

0
Entering edit mode

Does echo $BLASTDB really have : at the beginning? The result of ls -lh$BLASTDB should be the contents of the folder where your blast databases are located, not your home folder.

0
Entering edit mode

Fixed both of those, thank you. It still won't return taxaID with a custom db though.

0
Entering edit mode

With custom database I suppose you made a database with the makeblastdb command. Did you used sequences from genbank for that or from an other source? And did you added taxaid's when you made the database?

0
Entering edit mode

Yes, it was made the makeblastdb, and the contents are a fraction or rn - so yes GenBank. I did not add taxaIDs, but the taxadb is in the same dir as the custom db.

2
Entering edit mode
2.9 years ago
gb ★ 2.0k

You need to add the taxonIDs when you make the database.

After that you need to extract two columns:

sed '1d' prot.accession2taxid | awk '{print $2" "$3}' > accession_taxonid


Then you make the database like this:

sudo makeblastdb -in yourseqs.fa -dbtype prot -taxid_map accession_taxonid -parse_seqids


I have never done it with protein data, but I think it is the same as the nt.

EDIT: I think the process of adding the taxonIDs consumes a lot of memory. If it does not work blast will not give an error, so keep that in mind. If memory is a problem you first need to extract the accessions that you have from accession_taxonid and try it again.

0
Entering edit mode

This makes sense, thank you!