blastdb_aliastool only works for nr but not nt?
2
0
Entering edit mode
8.0 years ago
Gahoo ▴ 270

I've tried to make a subset of pre-formated blast database with blastdb_aliastool from ncbi-blast-2.3.0+. It failed on nt but success on nr. I pretty sure the file is intact, because I've check the md5sum. Here's a quick sample:

#retrieve data
wget http://ftp.ncbi.nlm.nih.gov/blast/db/nt.00.tar.gz
wget http://ftp.ncbi.nlm.nih.gov/blast/db/nr.00.tar.gz
tar -xf nt.00.tar.gz
tar -xf nr.00.tar.gz

#get some gi to test
blastdbcmd -db nr.00 -entry all|head|grep "^>"|sed -e 's/>gi|//g' -e 's/|.*//g' > nr_gi.txt
#success
blastdb_aliastool -gilist nr_gi.txt -db nr.00 -out nr_gi
#check alias db content
blastdbcmd -db nr_gi -entry all

#get some gi to test
blastdbcmd -db nt.00 -entry all|head|grep "^>"|sed -e 's/>gi|//g' -e 's/|.*//g' > nt_gi.txt
#failed
blastdb_aliastool -gilist nt_gi.txt -db nt.00 -out nt_gi
#check alias db content
blastdbcmd -db nt_gi -entry all

It failed with this message:

Converted 2 GIs from nt_gi.txt to binary format in nt_gi.p.gil
BLAST Database error: BLASTDB alias file creation failed.  Some referenced files may be missing

Why blastdb_aliastool only works on nr? Some post said specify -parse_seqids when makeblastdb should work(it is also nr). Then I tried:

# try makeblastdb first
blastdbcmd -db nr.00 -entry all|head -n 1000 > nr_test.fa
makeblastdb -in nr_test.fa -dbtype prot -parse_seqids -out nr_test
#success
blastdb_aliastool -gilist nr_gi.txt -db nr_test -out nr_gi_test
#check alias db content
blastdbcmd -db nr_gi_test -entry all

blastdbcmd -db nt.00 -entry all|head -n 1000 > nt_test.fa
makeblastdb -in nt_test.fa -dbtype nucl -parse_seqids -out nt_test
#failed again
blastdb_aliastool -gilist nt_gi.txt -db nt_test -out nt_gi_test
#check alias db content
blastdbcmd -db nt_gi_test -entry all

It's still not working. I found another post, which seems nt is also working. Was It related to the blast+ version? How to make alias db with blastdb_aliastool on nt correctly?

blast • 4.6k views
ADD COMMENT
2
Entering edit mode
8.0 years ago
Gahoo ▴ 270

Probelm solved. -dbtype is the reason. By default it is prot, if not specified.

blastdb_aliastool -gilist nt_gi.txt -db nt.00 -out nt_gi

This will fail, because -dbtype should not be prot for nt database.

blastdb_aliastool -gilist nt_gi.txt -db nt.00 -out nt_gi -dbtype nucl

This will work, because -dbtype should be nucl for nt database.

ADD COMMENT
1
Entering edit mode
8.0 years ago
pld 5.1k

You didn't download the whole nt database.

EDIT: What happens if you run blastdbcmd using your GI list to pull out sequence information from nt? What do the GI files look like?

ADD COMMENT
0
Entering edit mode

That's not the reason why it failed. I've download the whole nt database. The codes is to use the smallest data to regenerate the same issue. With or without the whole database, it's the same, Nr will work but Nt will not. GI files is a list of gi, which looks like:

489223532
66816243
66818355

You can get example gi files by running the codes above. Sequences could be pull out using GI list with blastdbcmd, both nr and nt. So it's quite wired.

ADD REPLY

Login before adding your answer.

Traffic: 1548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6