Biopython Ncbistandalone Blastall Gives Different Result Than Calling Blastall Directly From Cmd
1
1
Entering edit mode
14.0 years ago
Niek De Klein ★ 2.6k

So, first I tested what results I should get from the blastall program using the command line, with e-value 0.001:

C:\Niek\Test\blast-2.2.17\bin\blastall -p blastp -d C:\Niek\Test\arabidopsis-smallproteins.fasta -i C:\Niek\Test\arabidopsis-HD.fasta -e 0.001 -F F -m 8 -o C:\Niek\Test\arab-HD-smallproteins-notfiltered.out

and

C:\Niek\Test\blast-2.2.17\bin\blastall -p blastp -d C:\Niek\Test\arabidopsis-smallproteins.fasta -i C:\Niek\Test\arabidopsis-HD.fasta -e 0.001 -m 8 -o C:\Niek\Test\arab-HD-smallproteins-filtered.out

After that I made a local blast program, which works fine but it only found 91 results with e-value equal or lower than 0.001, where the results from the blastall via cmd gave around 140~ something results. I first thought it missed some, but all the e-values are different.

from Bio.Blast import NCBIStandalone
from Bio.Blast import NCBIXML

my_blast_db = r"C:\Niek\Test\arabidopsis-smallproteins.fasta"
my_blast_file = r"C:\Niek\Test\arabidopsis-HD.fasta"
my_blast_exe =r"C:\Niek\blast-2.2.17\bin\blastall.exe"

result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe, "blastp",
my_blast_db, my_blast_file)

blast_records = NCBIXML.parse(result_handle)

E_VALUE_THRESH = 0.001
x = 0
for blast_record in blast_records:
    blast_record = blast_records.next()
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            if hsp.expect <= E_VALUE_THRESH:
                print "==========Alignment========"
                print "sequence:", alignment.title
                print "length:", alignment.length
                print "e value:", hsp.expect
                x += 1

I first thought that the local blast from biopython uses a different algorithm, but at 'myblastexe =r"C:\Niek\blast-2.2.17\bin\blastall.exe"' I specify the same program but it should be the same. Then I thought it had something to do with the filtering option but I checked both filtered and unfiltered and wasn't any of that.

If you know why the local blast from biopython NCBIStandalone gives a different result than doing it directly in the cmd, please let me know.

Thanks in advance, Niek

edit: I checked, and it seems that the NCBIStandalone filters out 100% identities, which the blastall called by cmd does not. However this doesn't explain why the e-values are so different.

python biopython blast • 4.5k views
ADD COMMENT
0
Entering edit mode

i'm not a regular windows user, but is it a typo that your command-line blast paths don't end in .exe while the one you send to biopython does?

ADD REPLY
7
Entering edit mode
14.0 years ago
Peter 6.0k

One major problem is you are skipping half the results with this:

for blast_record in blast_records:
    blast_record = blast_records.next()
    ...

It should be just:

for blast_record in blast_records:
    ....

Or if you would rather call the next method explicitly for some reason, something like:

while True:
    blast_record = blast_records.next()
    if blast_record is None: break
    ...

Secondary issue: blastall is now being phased out by the NCBI who call it "legacy" BLAST, they encourage people to use BLAST+ instead, in this case blastp at the command line. As a result, the old Biopython wrappers for calling "legacy" BLAST are all considered obsolete.

ADD COMMENT
0
Entering edit mode

Thank you for the answer!

ADD REPLY

Login before adding your answer.

Traffic: 1449 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6