Extract only insecta information from Blast nr DB.
1
0
Entering edit mode
7 months ago

hello I'm trying to blast on Ubuntu. I downloaded blast DB, but it's too big, so I plan to use only the Insecta DB I need.

  1. blastdbcmd -db nr -entry_batch insecta.txt -outfmt "%f" -out nr_database_insecta.fasta -target_only -dbtype prot
  2. blastdbcmd -db nr -dbtype prot -taxid 50557 -out nr_insecta.fasta -entry all

I used the two commands above, but it doesn't work. Can anyone tell me what problem in here?

Thanks!

Blast • 739 views
ADD COMMENT
0
Entering edit mode

also BLAST Database error: No alias or index file found for protein database [nr] in search path [/mnt/d/blast/ncbi-blast-2.14.1+/bin::] I got this message..

ADD REPLY
1
Entering edit mode

Unless you are in the directory where the nr database is located (you seem to be in /mnt/d/blast/ncbi-blast-2.14.1+/bin), the blastdbcmd command will not work. Either you define a BLASTDB variable - see here - or type a full path to the nr database such as /mnt/c/db/nr by using the actual location instead of my made-up example.

ADD REPLY
1
Entering edit mode
7 months ago
GenoMax 142k

This does not work because you need to get the taxID's at a lower lever than insecta.

$ blastdbcmd -db nr -taxids 50557 -outfmt %f
Error: [blastdbcmd] Taxonomy ID(s) not found. This could be because the ID(s) provided are not at or below the species level. Please use get_species_taxids.sh to get taxids for nodes higher than species (see https://www.ncbi.nlm.nih.gov/books/NBK546209/).

You can do (you may need to install EntrezDirect):

$ get_species_taxids.sh -t 50557 >insect.id

This will get the insect taxID. Then retrieve the fasta data (only showing you the headers below)

$ blastdbcmd -db nr -taxidlist insect.id -outfmt %f | grep "^>" | head -3
>XP_035731174.1 growth arrest-specific protein 2-like [Vespa mandarinia]
>KAJ3664018.1 hypothetical protein Zmor_008225 [Zophobas morio]
>KAH8395558.1 hypothetical protein KR222_011244 [Zaprionus bogoriensis] 

Another option:

If you have the nr database downloaded you could limit the blast searches to insect taxID's using taxidlist option. Use the insect.id file generated above.

ADD COMMENT
0
Entering edit mode

Is the get_species_taxids.sh included in BLAST+? I think that was the better option I have been looking for all the time... Anyway, the documentation has been around since 2008 https://www.ncbi.nlm.nih.gov/books/NBK569846/

I have deleted my answer to avoid confusion because it does the same in a more complicated way.

ADD REPLY
0
Entering edit mode

It has been included in blast+ since v.2.8.1 (based on what I see).

ADD REPLY

Login before adding your answer.

Traffic: 1742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6