I am trying to create a blast database containing all plant sequences in Refseq release. I downloaded all the fasta files from the ftp site.
After discovering that some fasta files were larger than 1000000000 bytes, I split the overly large files into smaller fasta files using the following command:
awk 'BEGIN {n=0;} /^>/ {if(n%500==0){file=sprintf("chunk%d.fa",n);} print >> file; n++; next;} { print >> file; }' < multi.fa
Next, I proceeded to create the database using the command:
for i in *.f*a; do makeblastdb -in $i -dbtype nucl -taxid_map ../plant_refseq_genomic_taxidmap.txt -parse_seqids -title plantdb; done
Starting from over 1000 fasta files, I ended up with 1000 databases, each represented by 9 files (.ndb, .nhr, .nin, .nog, .nos, .not, .nsq, .ntf, .nto), that I want to group into a single alias.
I saved the list of all databases in a txt file:
plant.10.1.genomic.fna.1.fa
plant.10.1.genomic.fna.2.fa
plant.10.1.genomic.fna.3.fa
plant.10.1.genomic.fna.4.fa
plant.10.1.genomic.fna.5.fa
plant.10.1.genomic.fna.6.fa
plant.10.1.genomic.fna.7.fa
plant.10.1.genomic.fna.8.fa
plant.10.1.genomic.fna.9.fa
plant.10.2.genomic.fna
plant.10.3.genomic.fna
plant.10.4.genomic.fna
..
..
And I launched the following command:
blastdb_aliastool -dblist_file listdb.txt -dbtype nucl -out plantdb-refseq-release -title "plantdb-refseq-release"
But I am getting the following error:
BLAST Database error: BLASTDB alias file creation failed. Some referenced files may be missing.
What could be the reason for this error and how can I resolve it?
Thank you for your help
db_listfifileshould includebasenamesof your databases. Are those names correct?if the base names are the names of the files .ndb, .nhr, .nin, .nog, .nos, .not, .nsq, .ntf, .nto without the extention, yes