Error using -taxidlist blast
1
2
Entering edit mode
2.0 years ago
john ▴ 130

The task I try to accomplish is to restrict a blast search taxonomicaly to a certain group, but remove all previously found species.

I do so by creating a taxid list using the get_species_taxids.sh script provided by the blast suit. But when I remove certain taxids from the list I get the error:

BLAST Database error: Taxonomy ID(s) not found. This could be because the ID(s) provided are not at or below the species level. Please use get_species_taxids.sh to get taxids for nodes higher than species (see https://www.ncbi.nlm.nih.gov/books/NBK546209/).

Yet it works when I use the unmodified file created by get_species_taxids.sh.

Which seems odd as I did not add anything but only removed certain entries. While trying to get my head around the problem got even more confusing. For instance when using the script with 7214 (Drosophilidae) the first entry in the list is actually 7214. Which is not at species level. And the created file can be used with blast without any issues.

Or when I use the call with 40372 (Drosophila americana texana) a subspecies I also get the above error. While this taxid is clearly below species level.

Interestingly this problems only occurs when I use the ref_euk_rep_genomes DB. I do not seam to have the problem with the nt DB.

Here are the commands I use (Everything up to date at the time of writing).

$ get_species_taxids.sh -t 7214 > taxidlist
$ head -n 1 taxidlist
7214
$ sort taxidlist > taxidlist_sort
$ blastn -db ref_euk_rep_genomes -query input.fasta -word_size 30 -taxidlist taxidlist_sort > /dev/null
# Only to show that theire are no entrys in taxidlist_redu that are not in taxidlist_sort
$ comm -23 taxidlist_redu taxidlist_sort
$ blastn -db ref_euk_rep_genomes -query input.fasta -word_size 30 -taxidlist taxidlist_redu > /dev/null
BLAST Database error: Taxonomy ID(s) not found. This could be because the ID(s) provided are not at or below the species level. Please use get_species_taxids.sh to get taxids for nodes higher than species (see https://www.ncbi.nlm.nih.gov/books/NBK546209/).
$ blastn -db ref_euk_rep_genomes -query input.fasta -word_size 30 -taxids 40372 > /dev/null
BLAST Database error: Taxonomy ID(s) not found. This could be because the ID(s) provided are not at or below the species level. Please use get_species_taxids.sh to get taxids for nodes higher than species (see https://www.ncbi.nlm.nih.gov/books/NBK546209/).
taxid blast ncbi • 1.7k views
ADD COMMENT
1
Entering edit mode
23 months ago
john ▴ 130

A solution would be to delete all taxids which do not exist in the blast data base. A list of all taxids can be retrieved with this command:

$ blastdbcmd -db ref_euk_rep_genomes -entry all -outfmt %T > ref_euk_rep_genomes.taxidlist

All taxids which are not present in this list must be deleted.

$ sort ref_euk_rep_genomes.taxidlist > ref_euk_rep_genomes.taxidlist_sort
$ comm -12 taxidlist_reu ref_euk_rep_genomes.taxidlist_sort > taxidlist_redu_intersect

Interesting is still why after deletion the error occurs, but the complete list is taken with no problem.

ADD COMMENT

Login before adding your answer.

Traffic: 1987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6