Question: Biopython Ncbistandalone Blastall Gives Different Result Than Calling Blastall Directly From Cmd
1
gravatar for Niek De Klein
8.8 years ago by
Niek De Klein2.5k
Netherlands
Niek De Klein2.5k wrote:

So, first I tested what results I should get from the blastall program using the command line, with e-value 0.001:

C:\Niek\Test\blast-2.2.17\bin\blastall -p blastp -d C:\Niek\Test\arabidopsis-smallproteins.fasta -i C:\Niek\Test\arabidopsis-HD.fasta -e 0.001 -F F -m 8 -o C:\Niek\Test\arab-HD-smallproteins-notfiltered.out

and

C:\Niek\Test\blast-2.2.17\bin\blastall -p blastp -d C:\Niek\Test\arabidopsis-smallproteins.fasta -i C:\Niek\Test\arabidopsis-HD.fasta -e 0.001 -m 8 -o C:\Niek\Test\arab-HD-smallproteins-filtered.out

After that I made a local blast program, which works fine but it only found 91 results with e-value equal or lower than 0.001, where the results from the blastall via cmd gave around 140~ something results. I first thought it missed some, but all the e-values are different.

from Bio.Blast import NCBIStandalone
from Bio.Blast import NCBIXML

my_blast_db = r"C:\Niek\Test\arabidopsis-smallproteins.fasta"
my_blast_file = r"C:\Niek\Test\arabidopsis-HD.fasta"
my_blast_exe =r"C:\Niek\blast-2.2.17\bin\blastall.exe"

result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe, "blastp",
my_blast_db, my_blast_file)

blast_records = NCBIXML.parse(result_handle)

E_VALUE_THRESH = 0.001
x = 0
for blast_record in blast_records:
    blast_record = blast_records.next()
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            if hsp.expect <= E_VALUE_THRESH:
                print "==========Alignment========"
                print "sequence:", alignment.title
                print "length:", alignment.length
                print "e value:", hsp.expect
                x += 1

I first thought that the local blast from biopython uses a different algorithm, but at 'myblastexe =r"C:\Niek\blast-2.2.17\bin\blastall.exe"' I specify the same program but it should be the same. Then I thought it had something to do with the filtering option but I checked both filtered and unfiltered and wasn't any of that.

If you know why the local blast from biopython NCBIStandalone gives a different result than doing it directly in the cmd, please let me know.

Thanks in advance, Niek

edit: I checked, and it seems that the NCBIStandalone filters out 100% identities, which the blastall called by cmd does not. However this doesn't explain why the e-values are so different.

python biopython blast • 3.1k views
ADD COMMENTlink modified 8.4 years ago by Peter5.8k • written 8.8 years ago by Niek De Klein2.5k

i'm not a regular windows user, but is it a typo that your command-line blast paths don't end in .exe while the one you send to biopython does?

ADD REPLYlink written 8.8 years ago by brentp23k
7
gravatar for Peter
8.8 years ago by
Peter5.8k
Scotland, UK
Peter5.8k wrote:

One major problem is you are skipping half the results with this:

for blast_record in blast_records:
    blast_record = blast_records.next()
    ...

It should be just:

for blast_record in blast_records:
    ....

Or if you would rather call the next method explicitly for some reason, something like:

while True:
    blast_record = blast_records.next()
    if blast_record is None: break
    ...

Secondary issue: blastall is now being phased out by the NCBI who call it "legacy" BLAST, they encourage people to use BLAST+ instead, in this case blastp at the command line. As a result, the old Biopython wrappers for calling "legacy" BLAST are all considered obsolete.

ADD COMMENTlink modified 8.8 years ago • written 8.8 years ago by Peter5.8k

Thank you for the answer!

ADD REPLYlink written 8.8 years ago by Niek De Klein2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1094 users visited in the last hour