Blast has two incompatible parameters to limit the taxonomical searches:
-entrez_query
in association with-remote
-window_masker_taxid
(local database only)
if -entrez_query="txid8030[Organism:exp]”
works perfectly well remotely, I need to run locally most of my blast searches, so -window_masker_taxid
looks promissing. But I cannot figure our how to use it! Each attempt end-up with “BLAST EXCEPTION” errors!
Only documentation
- Use Windowmasker: http://www.ncbi.nlm.nih.gov/books/NBK279687/
- Create a masked BLAST database: http://www.ncbi.nlm.nih.gov/sites/books/NBK279681/
I downloaded the last version of "taxdb" and “refseq_rna” pre-compiled and tried:
$ blastdbcmd -db refseq_rna -entry 929050848 > test.fasta
$ blastdbcmd -db refseq_rna -entry 929050848 -outfmt "%T”
8030
$ blastn -query test.fasta -db refseq_rna -task blastn -window_masker_taxid 8030
BLASTN 2.2.31+
Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.
Database: NCBI Transcript Reference Sequences
16,094,088 sequences; 37,021,796,207 total letters
BLAST engine error: Warning: NCBI C++ Exception:
T0 "/build/ncbi-blast+-bzPf_D/ncbi-blast+-2.2.31/c++/src/algo/winmask/seq_masker_istat_factory.cpp", line 72: Error: ncbi::CSeqMaskerIstatFactory::create() - could not open
T0 "/build/ncbi-blast+-bzPf_D/ncbi-blast+-2.2.31/c++/src/algo/winmask/seq_masker_istat_factory.cpp", line 103: Error: ncbi::CSeqMaskerIstatFactory::create() - could not create a unit counts container
Then, I created a masking database as documented:
cd /data/blast_databases
windowmasker -in refseq_rna -infmt blastdb -mk_counts -parse_seqids -out refseq_rna_mask.counts -sformat obinary
windowmasker -in refseq_rna -infmt blastdb -ustat refseq_rna_mask.counts -outfmt maskinfo_asn1_bin -parse_seqids -out refseq_rna_mask.asnb
makeblastdb -in refseq_rna -input_type blastdb -dbtype nucl -parse_seqids -mask_data refseq_rna_mask.asnb -out refseq_rna_mask -title "NCBI Transcript Reference Sequences"
blastdbcmd -db refseq_rna_mask -entry 929050848 -outfmt "%T”
8030
blastn -query test.fasta -db refseq_rna_mask -task blastn -window_masker_taxid 8030
BLASTN 2.2.31+
Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.
Database: NCBI Transcript Reference Sequences
BLAST engine error: Warning: NCBI C++ Exception: T0 "/build/ncbi-blast+-bzPf_D/ncbi-blast+-2.2.31/c++/src/algo/winmask/seq_masker_istat_factory.cpp", line 72: Error: ncbi::CSeqMaskerIstatFactory::create() - could not open T0 "/build/ncbi-blast+-bzPf_D/ncbi-blast+-2.2.31/c++/src/algo/winmask/seq_masker_istat_factory.cpp", line 103: Error: ncbi::CSeqMaskerIstatFactory::create() - could not create a unit counts container
I cannot find any reference to the problem anywhere (I tried to google it)! Am I the only stupid one here or nobodyelse tried to use the option?