After many many hours testing and searching on Internet, I can't make a database alias using the blastdb_aliastool.
It starts with a very undocumented NCBI tool at https://www.ncbi.nlm.nih.gov/books/NBK569848/ (updated in 2021) giving this simple command line
blastdb_aliastool -db nematode_mrna -gilist c_elegans_mrna.gi -dbtype nucl -out c_elegans_mrna -title "C. elegans refseq mRNA entries"
I'm trying it both on Windows and Linux and the problem is the same. Shortly :
- use update_blastdb.pl to download and decompress databases (swissprot, nr...) -> works fine
- check the database with blastdbcmd -> works fine
- download the GI list of interest on NCBI -> works fine
- create an alias of a downloaded database using the GI list with blastdb_aliastool -> FAILS
Point 4. is done with a command like:
blastdb_aliastool -db nr/nr -out nr/actinop -gilist D:/DEV/var/temp/actinomadura.gi -dbtype prot -title "nr Actino prot"
The result is:
Converted 98631 GIs from D:/DEV/var/temp/actinomadura.gi to binary format in nr/actinop.p.gil
BLAST Database error: BLASTDB alias file creation failed. Some referenced files may be missing
The folder where the Blast executables are placed contains the .ncbirc (Linux) and ncbi.ini (Windows) files where the parameter BlastBD is filled. Each NCBI database is in its own folder, it's why -db is nr/nr and not simply nr. But I found that the .gil file is not created if I don't give the complete path names:
blastdb_aliastool -db f:/BlastDB/nr/nr -out f:/BlastDB/nr/actinop -gilist D:/DEV/var/temp/actinomadura.gi -dbtype prot -title "nr Actino prot"
I read many old posts saying that GI list will disappear, it seems not (we can download it from NCBI and the argument of last tool version has gilist argument), I tried with a list of accession ID instead of GI list but I didn't find which arguments of blastdb_aliastool can work for this way.
The steps look like it's a simple process but I need your help to make it, thanks!
Thank you. How do you checked the GI with the N/A response?
Is there a way to alias using accession numbers? Or taxID? I need to make a database from a list of organisms (some virus, bacterias and fungis). The more I can read from the installed nr, the less I have to go manually on NCBI website to download stuff.
You can use
blastdbcmd
to retrieve data from pre-formatted databases. An example that is supposed to retrievegi
numbers is the commandthis only returns
N/A
for entries. You can get accession numbers by replacing%g
with%a
, which works fine.If you know the taxID's of organisms you are interested in you can use that information to retrieve sequences with
-taxids
or-taxidlist
options withblastsbcmd
. For example following command retrieves sequences forActinomadura (taxID 1988)
from preformattednr
database in fasta format.Thank you. It is working now with:
But it displays this line even if the extraction succeed:
I have taxdb.btd, taxdb.bti and taxonomy4blast.sqlite3 in the nr folder
Can you set the following variable and see if that helps.
I found the problem!
I manage the DB folder using
.ncbirc
on Linux andncbi.ini
on WIndows, so BLASTDB is well defined. In my Windows caseBLASTDB=f:/BlastDB
. To be able to use this file, I need to go into the folder (cd
command) then launch from there.To easily work with the data, each database is put on its own folder:
As I struggled with the missing taxonomy files, I put the 3 taxdb files in /nr folder or in the blastdbcmd folder with the same error.
The solution is to put taxonomy4blast.sqlite3 in the directory specified in the configuration file, so directly in BLASTDB
So having
f:/BlastDB/taxonomy4blast.sqlite3
(other files are not necessary) makes the error message disappear.And I don't need to specify it for -db:
Good you found a solution. This is a particular issue because of the way you are doing things.
Consider accepting my original answer (green check mark) to provide closure to this thread.