I want to blast some EST with local blast. However, for some sequences, there are no matches while with the website http://www.arabidopsis.org/Blast/index.jsp.(I download the database from the site, too. so the database I used locally is same to the website.) there are do some matches found. I don't know what is the problem and how can I fix it?
I used blast 2.2.25+, built the database with this command:
makeblastdb -in TAIR10_cdna.fast -out TAIR10_cdna -dbtype nucl -input_type fasta
next I did the blast:
blastn -query buff.fa -db TAIR10_cdna -out cx274252 -dust yes -max_target_seqs 250 -penalty -3 -outfmt 4 -gapopen 5 -gapextend 2
the output like this:
BLASTN 2.2.25+ Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14. Database: TAIR10_cdna.fast 41,671 sequences; 64,867,051 total letters Query= CX274252 Length=662 ***** No hits found ***** Lambda K H 1.37 0.711 1.31 Gapped Lambda K H 1.37 0.711 1.31 Effective search space used: 41291330612 Database: TAIR10_cdna.fast Posted date: May 9, 2011 11:18 PM Number of letters in database: 64,867,051 Number of sequences in database: 41,671 Matrix: blastn matrix 1 -3 Gap Penalties: Existence: 5, Extension: 2
while the results from the website was:
BLASTN 2.2.17 [Aug-26-2007] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= CX274252 (662 letters) Database: TAIR10 Transcripts (-introns, +UTRs) (DNA) 41,671 sequences; 64,867,051 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value AT4G27160.1 | Symbols: AT2S3, SESA3 | seed storage albumin ... 64 2e-09 AT4G27140.1 | Symbols: SESA1, AT2S1 | seed storage albumin ... 48 1e-04 AT4G27150.1 | Symbols: SESA2, AT2S2 | seed storage albumin ... 44 0.002 AT1G14170.3 | Symbols: | RNA-binding KH domain-containing ... 44 0.002 AT1G14170.2 | Symbols: | RNA-binding KH domain-containing ... 44 0.002 AT1G14170.1 | Symbols: | RNA-binding KH domain-containing ... 44 0.002 AT4G27170.1 | Symbols: SESA4, AT2S4 | seed storage albumin ... 42 0.009 AT4G00895.1 | Symbols: | ATPase, F1 complex, OSCP/delta su... 36 0.53 .............( this is very long list, so I bypassed some contents) Database: TAIR10 Transcripts (-introns, +UTRs) (DNA) Posted date: Jan 13, 2011 1:41 PM Number of letters in database: 64,867,051 Number of sequences in database: 41,671 Lambda K H 1.37 0.711 1.31 Gapped Lambda K H 1.37 0.711 1.31 Matrix: blastn matrix:1 -3 Gap Penalties: Existence: 5, Extension: 2 Number of Sequences: 41671 Number of Hits to DB: 342,107 Number of extensions: 18701 Number of successful extensions: 1355 Number of sequences better than 10.0: 41 Number of HSP's gapped: 1354 Number of HSP's successfully gapped: 53 Length of query: 662 Length of database: 64,867,051 Length adjustment: 18 Effective length of query: 644 Effective length of database: 64,116,973 Effective search space: 41291330612 Effective search space used: 41291330612 X1: 11 (21.8 bits) X2: 15 (29.7 bits) X3: 25 (49.6 bits) S1: 13 (26.3 bits) S2: 16 (32.2 bits)
I set the parameters mostly as same as the website setting excepting the weighted matrix and max_score that I don't know how to set them.