So, first I tested what results I should get from the blastall program using the command line, with e-value 0.001:
C:\Niek\Test\blast-2.2.17\bin\blastall -p blastp -d C:\Niek\Test\arabidopsis-smallproteins.fasta -i C:\Niek\Test\arabidopsis-HD.fasta -e 0.001 -F F -m 8 -o C:\Niek\Test\arab-HD-smallproteins-notfiltered.out
and
C:\Niek\Test\blast-2.2.17\bin\blastall -p blastp -d C:\Niek\Test\arabidopsis-smallproteins.fasta -i C:\Niek\Test\arabidopsis-HD.fasta -e 0.001 -m 8 -o C:\Niek\Test\arab-HD-smallproteins-filtered.out
After that I made a local blast program, which works fine but it only found 91 results with e-value equal or lower than 0.001, where the results from the blastall via cmd gave around 140~ something results. I first thought it missed some, but all the e-values are different.
from Bio.Blast import NCBIStandalone
from Bio.Blast import NCBIXML
my_blast_db = r"C:\Niek\Test\arabidopsis-smallproteins.fasta"
my_blast_file = r"C:\Niek\Test\arabidopsis-HD.fasta"
my_blast_exe =r"C:\Niek\blast-2.2.17\bin\blastall.exe"
result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe, "blastp",
my_blast_db, my_blast_file)
blast_records = NCBIXML.parse(result_handle)
E_VALUE_THRESH = 0.001
x = 0
for blast_record in blast_records:
blast_record = blast_records.next()
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
if hsp.expect <= E_VALUE_THRESH:
print "==========Alignment========"
print "sequence:", alignment.title
print "length:", alignment.length
print "e value:", hsp.expect
x += 1
I first thought that the local blast from biopython uses a different algorithm, but at 'myblastexe =r"C:\Niek\blast-2.2.17\bin\blastall.exe"' I specify the same program but it should be the same. Then I thought it had something to do with the filtering option but I checked both filtered and unfiltered and wasn't any of that.
If you know why the local blast from biopython NCBIStandalone gives a different result than doing it directly in the cmd, please let me know.
Thanks in advance, Niek
edit: I checked, and it seems that the NCBIStandalone filters out 100% identities, which the blastall called by cmd does not. However this doesn't explain why the e-values are so different.
i'm not a regular windows user, but is it a typo that your command-line blast paths don't end in .exe while the one you send to biopython does?