So, first I tested what results I should get from the blastall program using the command line, with e-value 0.001:
C:\Niek\Test\blast-2.2.17\bin\blastall -p blastp -d C:\Niek\Test\arabidopsis-smallproteins.fasta -i C:\Niek\Test\arabidopsis-HD.fasta -e 0.001 -F F -m 8 -o C:\Niek\Test\arab-HD-smallproteins-notfiltered.out
C:\Niek\Test\blast-2.2.17\bin\blastall -p blastp -d C:\Niek\Test\arabidopsis-smallproteins.fasta -i C:\Niek\Test\arabidopsis-HD.fasta -e 0.001 -m 8 -o C:\Niek\Test\arab-HD-smallproteins-filtered.out
After that I made a local blast program, which works fine but it only found 91 results with e-value equal or lower than 0.001, where the results from the blastall via cmd gave around 140~ something results. I first thought it missed some, but all the e-values are different.
from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML my_blast_db = r"C:\Niek\Test\arabidopsis-smallproteins.fasta" my_blast_file = r"C:\Niek\Test\arabidopsis-HD.fasta" my_blast_exe =r"C:\Niek\blast-2.2.17\bin\blastall.exe" result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe, "blastp", my_blast_db, my_blast_file) blast_records = NCBIXML.parse(result_handle) E_VALUE_THRESH = 0.001 x = 0 for blast_record in blast_records: blast_record = blast_records.next() for alignment in blast_record.alignments: for hsp in alignment.hsps: if hsp.expect <= E_VALUE_THRESH: print "==========Alignment========" print "sequence:", alignment.title print "length:", alignment.length print "e value:", hsp.expect x += 1
I first thought that the local blast from biopython uses a different algorithm, but at 'myblastexe =r"C:\Niek\blast-2.2.17\bin\blastall.exe"' I specify the same program but it should be the same. Then I thought it had something to do with the filtering option but I checked both filtered and unfiltered and wasn't any of that.
If you know why the local blast from biopython NCBIStandalone gives a different result than doing it directly in the cmd, please let me know.
Thanks in advance, Niek
edit: I checked, and it seems that the NCBIStandalone filters out 100% identities, which the blastall called by cmd does not. However this doesn't explain why the e-values are so different.