Indexing error with local blast
1
0
Entering edit mode
4.3 years ago
ando.kelli ▴ 60

Hi all,

I'm trying to do a blast to local NCBI nt database in Linux. The code I've used to download and decompress the DB is:

cd /volume/BlastDBs

update_blastdb.pl nt
update_blastdb.pl taxdb

gunzip *tar.g*

ls *.tar |xargs -n1 tar -xvf

Notes: I needed to do a loop as tar will only do one file at a time I noticed that the .md5 suffix files not recognised -> was ignored by gunzip This folder has files with the following extensions for each 'piece' of the DB:

.tar
.tar.gz.md5
.nhd
.nhi
.nhr
.nin
.nnd
.nni
.nog
.nsd
.nsi
.nsq

There is also an nt.nal file

The taxdb has:

.tar
.tar.gz.md5
.btd
.bti

Doing the blast (I tried with the full path to the DB, and using the -nt parameter, got the same error message both times):

blastn -db /volume/BlastDBs -query /volume/Assembly_data/Test/Test.fasta -outfmt 6 -out TEST_blastn.tabular

I get the error message:

BLAST Database error: No alias or index file found for nucleotide database [/volume/BlastDBs] in search path [/volume/Assembly_data/Test:$/volume/BlastDBs:]

As a way of troubleshooting, I created a small DB from a fasta file using the following script:

cd /volume/Assembly_data/Test/TestDB

makeblastdb -dbtype nucl -in /volume/Assembly_data/Test/Test.fasta -title Test_db -out Test_db

It seemed to work and created 3 output files in the TestDB directory: Test_db.nhr Test_db.nin Test_db.nsq

Then ran the blast using:

blastn -db /volume/Assembly_data/Test/TestDB -query /volume/Assembly_data/Test/Test.fasta -outfmt 6 -out TEST_blastn.tabular

Got the same error message. ..............................

I don't know if it's important, but I installed blast using:

conda install blast

Any suggestions about what could be going wrong are greatly appreciated. It's doing my head in.

Cheers, Kelli

Annotation Blast database Linux RNA-Seq • 1.6k views
ADD COMMENT
3
Entering edit mode
4.3 years ago
GenoMax 141k

You need to use the basename of the blast index in your command line for the -db option. Since you are using nt database (which also is the basename) your command needs to be

blastn -db /volume/BlastDBs/nt -query /volume/Assembly_data/Test/Test.fasta -outfmt 6 -out TEST_blastn.tabular

Also in other test case use the basename Test_db.

blastn -db /volume/Assembly_data/Test/TestDB/Test_db -query /volume/Assembly_data/Test/Test.fasta -outfmt 6 -out TEST_blastn.tabular
ADD COMMENT
0
Entering edit mode

Wow. I can't believe that's what the issue was. Thank you so much Genomax!!

ADD REPLY
0
Entering edit mode

Hi again Genomax,

I split my assembly so that I can do the annotation in parallel. When I run the following code, the output is missing sscinames sskingdom

ls trinity_out_dir.Trinity.*.fasta | parallel --eta -j 10 --load 80% --noswap 'blastn -db /volume/BlastDBs/nt -query {} -out blastn_outfiles/{.}.tabular -evalue 1e-5 -outfmt "6 std stitle staxids sscinames sskingdom" -max_target_seqs 1 -max_hsps 1 -num_threads 2'

I get the error message: Warning: [blastn] Taxonomy name lookup from taxid requires installation of taxdb database with ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz

The taxdb .btd and .bti files are in the same directory as the nt db files. How can I get Blast to find them properly?

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks again Genomax :-)

ADD REPLY

Login before adding your answer.

Traffic: 3694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6