Question: E Values Using Blastp In Biopython
0
gravatar for juliet.hannah
7.7 years ago by
juliet.hannah40 wrote:

Is there a reason for the E Value to differ when using BLAST on the web and using BioPython? My understanding is that these are the same source, so I am unable to understand the following differences.

As an example, consider the sequence

tpdavmgnpk

Submit this to blastp and compare it with the following python script:

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

peptide = "tpdavmgnpk"
myEntrezQuery = "Homo sapiens[Organism]"
result  = NCBIWWW.qblast("blastp", "nr", peptide,entrez_query=myEntrezQuery)

records = NCBIXML.parse(result)
blast_record = records.next()

for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
        if hsp.expect < 5:
            print "***** RECORD ****"
            print "sequence:", alignment.title
            print "E-value:", hsp.expect

Here are two examples of differing E values I obtain

Accession, Biopython E value, NCBI web E value

AAW66689.1 1.20033, 0.045

AAA53153.1 1.21977, 0.075

EDIT: I have tried making the defaults similar (Peter's answer and Ben's comment) and this link:

result  = NCBIWWW.qblast("blastp", "nr", peptide,entrez_query=myEntrezQuery,matrix_name='BLOSUM62',word_size='2',expect='50000',gapcosts='11 1',composition_based_statistics='no adjustment')

The results are still not matching.

Thanks!

biopython blast • 3.4k views
ADD COMMENTlink modified 7.7 years ago by Peter5.8k • written 7.7 years ago by juliet.hannah40

The NCBI web interface adjusts BLAST parameters for short sequences - does BioPython?

ADD REPLYlink written 7.7 years ago by Ben2.0k

Thanks for this insight. I have edited my question to address this.

ADD REPLYlink written 7.7 years ago by juliet.hannah40

I found this from a few years ago that may be helpful (with some adjustment)

ADD REPLYlink written 7.7 years ago by Ben2.0k
1
gravatar for Josh Herr
7.7 years ago by
Josh Herr5.7k
University of Nebraska
Josh Herr5.7k wrote:

E-values are just threshold parameters and will change based on the size of your database. Are you sure when you are using Biopython and the NCBI web interface that you are BLASTing to the exact same database? I'm not sure how often the NCBI database is updated, but even daily updates in the database will change E-values. I don't worry about it so much, but I also don't put so much emphasis on E-values.

That being said, it does look like there is a large difference in your E-values, so I would first check the database. It looks like your script isn't the problem on my end.

ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Josh Herr5.7k
1
gravatar for Peter
7.7 years ago by
Peter5.8k
Scotland, UK
Peter5.8k wrote:

To quote from the Biopython FAQ,

Why doesn’t Bio.Blast.NCBIWWW.qblast() give the same results as the NCBI BLAST website? You need to specify the same options – the NCBI often adjust the default settings on the website, and they do not match the QBLAST defaults anymore. Check things like the gap penalties and expectation threshold.

See http://biopython.org/DIST/docs/tutorial/Tutorial.html Have you checked that?

ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Peter5.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 997 users visited in the last hour