Blast E Value Calculation
5
2
Entering edit mode
8.8 years ago
darxsys ▴ 210

I'm trying to work out the formula for BLAST e value calculation. I have surfed all across the internet looking for it, but the only thing I'm able to find is the formula provided here: http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html The problem is, I can't get the correct value for whatever parameters I plug into it. What I do? I run blastp module of blast on a query and Uniprot database. BLAST then gives me values for gapped K and lambda, as well as scores, database length and query length. When I plug that into the formula E = K * m * n * exp (- lambda * S), I get absolute nonsense. I tried various stuff for m and n - including database length, query length, result sequence length - tried bit scores and real scores, and I simply can't get the E value which blast outputs by itself, no matter what. Is there a way to correctly calculate E value and how? Please help.

blast • 19k views
1
Entering edit mode

Is there some reason you need to calculate e-value for yourself? Seems like blastp has done it for you :)

0
Entering edit mode

Yes. I'm trying to find a way to calculate it because I need it in my project (if it is possible in any way).

1
Entering edit mode

Hi, did you get the answer for this questions. Thanks. If so,could you give some tips

0
Entering edit mode
1
Entering edit mode
8.8 years ago

bitscore is a normalized score calculated from the alignment which depends on the scoring system: (lambda * S - ln(k)) / ln(2)

The p-value of a blast alignment is basically: 1 / ( 2 ^ bitscore)

So if your bitscore is 10, you would need to score 2 ^ 10 alignments before you will get a score as good or better.

The e-value is just a p-value normalized to the database size: query length * database length * p-value = query length * database length / (2 ^ bitscore)

0
Entering edit mode

Thanks for help, but still, I don't get the same result as blast. Okay, I'll give you the example on which I'm trying to do it in the hope you can get something out of it. I run blastp on Uniprot, with this query: Query= d1r5la1 a.5.3.1 (A:25-90) Alpha-tocopherol transfer protein {Human (Homo sapiens) [TaxId: 9606]} Length=66 I took that query from SCOP/Astral database. The first significant alignment I get from blast is for > sp|P49638|TTPA_HUMAN. and blast says this: Score = 136 bits (343) and Expect = 1e-39. Okay. When I plug in the stuff you told me to, I get: 1.448e-31. That is nowhere close the blast's value. Assuming database length is 191,240,774, K is 0.041, and lambda is 0.267. (Although I didn't need them right now.)

0
Entering edit mode

How did you get lambda? Some versions of blast don't use a constant lambda for all HSPs.

0
Entering edit mode

Gapped and ungapped values for lambda are given as part of the outpuf of blastp.

0
Entering edit mode

I think that recent versions of blastp use a slightly different stats model than the classical Altschul one. I faced similar discrenpencies between evalues from blastp and those I computed myself the same way you did.

What is your blast version ? If it is quite recent, you should try version 2.2.26 or 2.2.27 for instance (I guess they still use the old model). Otherwise, you could set the environment variable OLD_FSC to 1 before launching your blast command; it will switch back to the old model.

0
Entering edit mode

Thanks. I used the latest version. I try OLD_FSC = 1 and see what happen

0
Entering edit mode

How do you set OLD_FSC=1?

0
Entering edit mode

It's Protein-Protein BLAST 2.2.29+

0
Entering edit mode
5.3 years ago

E=pN where p is the p-value and N is the total length of the database divided by the length of the aligned database sequence

0
Entering edit mode
5.2 years ago
Lhl ▴ 730

have a read of this http://homepages.ulb.ac.be/~dgonze/TEACHING/stat_scores.pdf

0
Entering edit mode
5.0 years ago
midox ▴ 270

excuse me. here how can i determine the K and lambda values of the e-value formula please?

thanks