Question: blastdb_aliastool only works for nr but not nt?
0
gravatar for Gahoo
19 months ago by
Gahoo260
United States
Gahoo260 wrote:

I've tried to make a subset of pre-formated blast database with blastdb_aliastool from ncbi-blast-2.3.0+. It failed on nt but success on nr. I pretty sure the file is intact, because I've check the md5sum. Here's a quick sample:

#retrieve data
wget http://ftp.ncbi.nlm.nih.gov/blast/db/nt.00.tar.gz
wget http://ftp.ncbi.nlm.nih.gov/blast/db/nr.00.tar.gz
tar -xf nt.00.tar.gz
tar -xf nr.00.tar.gz

#get some gi to test
blastdbcmd -db nr.00 -entry all|head|grep "^>"|sed -e 's/>gi|//g' -e 's/|.*//g' > nr_gi.txt
#success
blastdb_aliastool -gilist nr_gi.txt -db nr.00 -out nr_gi
#check alias db content
blastdbcmd -db nr_gi -entry all

#get some gi to test
blastdbcmd -db nt.00 -entry all|head|grep "^>"|sed -e 's/>gi|//g' -e 's/|.*//g' > nt_gi.txt
#failed
blastdb_aliastool -gilist nt_gi.txt -db nt.00 -out nt_gi
#check alias db content
blastdbcmd -db nt_gi -entry all

It failed with this message:

Converted 2 GIs from nt_gi.txt to binary format in nt_gi.p.gil
BLAST Database error: BLASTDB alias file creation failed.  Some referenced files may be missing

Why blastdb_aliastool only works on nr? Some post said specify -parse_seqids when makeblastdb should work(it is also nr). Then I tried:

# try makeblastdb first
blastdbcmd -db nr.00 -entry all|head -n 1000 > nr_test.fa
makeblastdb -in nr_test.fa -dbtype prot -parse_seqids -out nr_test
#success
blastdb_aliastool -gilist nr_gi.txt -db nr_test -out nr_gi_test
#check alias db content
blastdbcmd -db nr_gi_test -entry all

blastdbcmd -db nt.00 -entry all|head -n 1000 > nt_test.fa
makeblastdb -in nt_test.fa -dbtype nucl -parse_seqids -out nt_test
#failed again
blastdb_aliastool -gilist nt_gi.txt -db nt_test -out nt_gi_test
#check alias db content
blastdbcmd -db nt_gi_test -entry all

It's still not working. I found another post, which seems nt is also working. Was It related to the blast+ version? How to make alias db with blastdb_aliastool on nt correctly?

blast • 805 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by Gahoo260
2
gravatar for Gahoo
19 months ago by
Gahoo260
United States
Gahoo260 wrote:

Probelm solved. -dbtype is the reason. By default it is prot, if not specified.

blastdb_aliastool -gilist nt_gi.txt -db nt.00 -out nt_gi

This will fail, because -dbtype should not be prot for nt database.

blastdb_aliastool -gilist nt_gi.txt -db nt.00 -out nt_gi -dbtype nucl

This will work, because -dbtype should be nucl for nt database.

ADD COMMENTlink modified 19 months ago • written 19 months ago by Gahoo260
1
gravatar for pld
19 months ago by
pld4.6k
United States
pld4.6k wrote:

You didn't download the whole nt database.

EDIT: What happens if you run blastdbcmd using your GI list to pull out sequence information from nt? What do the GI files look like?

ADD COMMENTlink modified 19 months ago • written 19 months ago by pld4.6k

That's not the reason why it failed. I've download the whole nt database. The codes is to use the smallest data to regenerate the same issue. With or without the whole database, it's the same, Nr will work but Nt will not. GI files is a list of gi, which looks like:

489223532
66816243
66818355

You can get example gi files by running the codes above. Sequences could be pull out using GI list with blastdbcmd, both nr and nt. So it's quite wired.

ADD REPLYlink modified 19 months ago • written 19 months ago by Gahoo260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 603 users visited in the last hour