Question: E Values Using Blastp In Biopython
7.7 years ago
wrote:

Is there a reason for the E Value to differ when using BLAST on the web and using BioPython? My understanding is that these are the same source, so I am unable to understand the following differences.

As an example, consider the sequence


Submit this to blastp and compare it with the following python script:

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

peptide = "tpdavmgnpk"
myEntrezQuery = "Homo sapiens[Organism]"
result  = NCBIWWW.qblast("blastp", "nr", peptide,entrez_query=myEntrezQuery)

records = NCBIXML.parse(result)
blast_record =

for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
        if hsp.expect < 5:
            print "***** RECORD ****"
            print "sequence:", alignment.title
            print "E-value:", hsp.expect

Here are two examples of differing E values I obtain

Accession, Biopython E value, NCBI web E value

AAW66689.1 1.20033, 0.045

AAA53153.1 1.21977, 0.075

EDIT: I have tried making the defaults similar (Peter's answer and Ben's comment) and this link:

result  = NCBIWWW.qblast("blastp", "nr", peptide,entrez_query=myEntrezQuery,matrix_name='BLOSUM62',word_size='2',expect='50000',gapcosts='11 1',composition_based_statistics='no adjustment')

The results are still not matching.


biopython blast
written 7.7 years ago

The NCBI web interface adjusts BLAST parameters for short sequences - does BioPython?

written 7.7 years ago by Ben

Thanks for this insight. I have edited my question to address this.

written 7.7 years ago

I found this from a few years ago that may be helpful (with some adjustment)

written 7.7 years ago by Ben
7.7 years ago
Josh Herr
University of Nebraska
wrote:

E-values are just threshold parameters and will change based on the size of your database. Are you sure when you are using Biopython and the NCBI web interface that you are BLASTing to the exact same database? I'm not sure how often the NCBI database is updated, but even daily updates in the database will change E-values. I don't worry about it so much, but I also don't put so much emphasis on E-values.

That being said, it does look like there is a large difference in your E-values, so I would first check the database. It looks like your script isn't the problem on my end.

written 7.7 years ago by Josh Herr
7.7 years ago
Scotland, UK
wrote:

To quote from the Biopython FAQ,

Why doesn’t Bio.Blast.NCBIWWW.qblast() give the same results as the NCBI BLAST website? You need to specify the same options – the NCBI often adjust the default settings on the website, and they do not match the QBLAST defaults anymore. Check things like the gap penalties and expectation threshold.

See Have you checked that?

written 7.7 years ago by Peter
