E Values Using Blastp In Biopython
2
0
Entering edit mode
11.5 years ago

Is there a reason for the E Value to differ when using BLAST on the web and using BioPython? My understanding is that these are the same source, so I am unable to understand the following differences.

As an example, consider the sequence

tpdavmgnpk

Submit this to blastp and compare it with the following python script:

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

peptide = "tpdavmgnpk"
myEntrezQuery = "Homo sapiens[Organism]"
result  = NCBIWWW.qblast("blastp", "nr", peptide,entrez_query=myEntrezQuery)

records = NCBIXML.parse(result)
blast_record = records.next()

for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
        if hsp.expect < 5:
            print "***** RECORD ****"
            print "sequence:", alignment.title
            print "E-value:", hsp.expect

Here are two examples of differing E values I obtain

Accession, Biopython E value, NCBI web E value

AAW66689.1 1.20033, 0.045

AAA53153.1 1.21977, 0.075

EDIT: I have tried making the defaults similar (Peter's answer and Ben's comment) and this link:

result  = NCBIWWW.qblast("blastp", "nr", peptide,entrez_query=myEntrezQuery,matrix_name='BLOSUM62',word_size='2',expect='50000',gapcosts='11 1',composition_based_statistics='no adjustment')

The results are still not matching.

Thanks!

biopython blast • 5.2k views
ADD COMMENT
0
Entering edit mode

The NCBI web interface adjusts BLAST parameters for short sequences - does BioPython?

ADD REPLY
0
Entering edit mode

Thanks for this insight. I have edited my question to address this.

ADD REPLY
0
Entering edit mode

I found this from a few years ago that may be helpful (with some adjustment)

ADD REPLY
1
Entering edit mode
11.5 years ago
Josh Herr 5.8k

E-values are just threshold parameters and will change based on the size of your database. Are you sure when you are using Biopython and the NCBI web interface that you are BLASTing to the exact same database? I'm not sure how often the NCBI database is updated, but even daily updates in the database will change E-values. I don't worry about it so much, but I also don't put so much emphasis on E-values.

That being said, it does look like there is a large difference in your E-values, so I would first check the database. It looks like your script isn't the problem on my end.

ADD COMMENT
1
Entering edit mode
11.5 years ago
Peter 6.0k

To quote from the Biopython FAQ,

Why doesn’t Bio.Blast.NCBIWWW.qblast() give the same results as the NCBI BLAST website? You need to specify the same options – the NCBI often adjust the default settings on the website, and they do not match the QBLAST defaults anymore. Check things like the gap penalties and expectation threshold.

See http://biopython.org/DIST/docs/tutorial/Tutorial.html Have you checked that?

ADD COMMENT

Login before adding your answer.

Traffic: 2252 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6