Create local Blast database alias with blastdb_aliastool fails
1
0
Entering edit mode
8 weeks ago
etienne • 0

After many many hours testing and searching on Internet, I can't make a database alias using the blastdb_aliastool.

It starts with a very undocumented NCBI tool at https://www.ncbi.nlm.nih.gov/books/NBK569848/ (updated in 2021) giving this simple command line

blastdb_aliastool -db nematode_mrna -gilist c_elegans_mrna.gi -dbtype nucl -out c_elegans_mrna -title "C. elegans refseq mRNA entries"

I'm trying it both on Windows and Linux and the problem is the same. Shortly :

  1. use update_blastdb.pl to download and decompress databases (swissprot, nr...) -> works fine
  2. check the database with blastdbcmd -> works fine
  3. download the GI list of interest on NCBI -> works fine
  4. create an alias of a downloaded database using the GI list with blastdb_aliastool -> FAILS

Point 4. is done with a command like:

blastdb_aliastool -db nr/nr -out nr/actinop -gilist D:/DEV/var/temp/actinomadura.gi -dbtype prot -title "nr Actino prot"

The result is:

Converted 98631 GIs from D:/DEV/var/temp/actinomadura.gi to binary format in nr/actinop.p.gil

BLAST Database error: BLASTDB alias file creation failed. Some referenced files may be missing

The folder where the Blast executables are placed contains the .ncbirc (Linux) and ncbi.ini (Windows) files where the parameter BlastBD is filled. Each NCBI database is in its own folder, it's why -db is nr/nr and not simply nr. But I found that the .gil file is not created if I don't give the complete path names:

blastdb_aliastool -db f:/BlastDB/nr/nr -out f:/BlastDB/nr/actinop -gilist D:/DEV/var/temp/actinomadura.gi -dbtype prot -title "nr Actino prot"

I read many old posts saying that GI list will disappear, it seems not (we can download it from NCBI and the argument of last tool version has gilist argument), I tried with a list of accession ID instead of GI list but I didn't find which arguments of blastdb_aliastool can work for this way.

The steps look like it's a simple process but I need your help to make it, thanks!

blast • 819 views
ADD COMMENT
1
Entering edit mode
8 weeks ago
GenoMax 152k

Unfortunately current nr pre-formatted database files available from NCBI do not appear to have gi numbers (I checked a bunch of sequences and you only get N/A). So this mode of blastdb_aliastool is not going to work.

Your alternate option is to pull the sequences you need using blastdbcmd and then make a new database using makeblastdb from that file.

ADD COMMENT
0
Entering edit mode

Thank you. How do you checked the GI with the N/A response?

Is there a way to alias using accession numbers? Or taxID? I need to make a database from a list of organisms (some virus, bacterias and fungis). The more I can read from the installed nr, the less I have to go manually on NCBI website to download stuff.

ADD REPLY
0
Entering edit mode

You can use blastdbcmd to retrieve data from pre-formatted databases. An example that is supposed to retrieve gi numbers is the command

$ blastdbcmd -db nr -outfmt %g -entry all

this only returns N/A for entries. You can get accession numbers by replacing %g with %a, which works fine.

If you know the taxID's of organisms you are interested in you can use that information to retrieve sequences with -taxids or -taxidlist options with blastsbcmd. For example following command retrieves sequences for Actinomadura (taxID 1988) from preformatted nr database in fasta format.

 $ blastdbcmd -db nr -outfmt %f -taxids 1988
ADD REPLY
0
Entering edit mode

Thank you. It is working now with:

blastdbcmd -db f:/BlastDB/nr/nr -taxids 1988 -out f:/BlastDB/extract/acti3.fa

But it displays this line even if the extraction succeed:

The -taxids command line option requires additional data files. Please see the section 'Taxonomic filtering for BLAST databases' in https://www.ncbi.nlm.nih.gov/books/NBK569839/ for details.

I have taxdb.btd, taxdb.bti and taxonomy4blast.sqlite3 in the nr folder

ADD REPLY
0
Entering edit mode

Can you set the following variable and see if that helps.

BLASTDB=f:/BlastDB/nr/nr
export BLASTDB
ADD REPLY
0
Entering edit mode

I found the problem!

I manage the DB folder using .ncbirc on Linux and ncbi.ini on WIndows, so BLASTDB is well defined. In my Windows case BLASTDB=f:/BlastDB. To be able to use this file, I need to go into the folder (cd command) then launch from there.

To easily work with the data, each database is put on its own folder:

f:/BlastDB/nr/nr.pal (... and other nr files)

f:/BlastDB/swissprot/swissprot.pal (... and other swissprot files)

f:/BlastDB/taxdb/taxdb.btd, taxdb.bti, taxonomy4blast.sqlite3

As I struggled with the missing taxonomy files, I put the 3 taxdb files in /nr folder or in the blastdbcmd folder with the same error.

The solution is to put taxonomy4blast.sqlite3 in the directory specified in the configuration file, so directly in BLASTDB

So having f:/BlastDB/taxonomy4blast.sqlite3 (other files are not necessary) makes the error message disappear.

And I don't need to specify it for -db:

blastdbcmd -db nr/nr -taxids 1988 -out f:/BlastDB/extract/acti3.fa
ADD REPLY
0
Entering edit mode

Good you found a solution. This is a particular issue because of the way you are doing things.

Consider accepting my original answer (green check mark) to provide closure to this thread.

ADD REPLY

Login before adding your answer.

Traffic: 1991 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6