Question: Why does the BLAST use E-value instead of p-value?
1
gravatar for mangfu100
3.9 years ago by
mangfu100670
Korea, Republic Of
mangfu100670 wrote:

Hi all.

I think that p-value is one of the most greatest way of measuring degree of observed data.

However, BLAST doesn't use p-value but E-value. 

Why the BLAST use e-value for interpreting sequence data instead of p-value?

Is there any logical reason to use E-value for BLAST? If so, could you tell me the detail reason?

sequencing alignment • 7.0k views
ADD COMMENTlink modified 3.9 years ago by Csaba Kerepesi320 • written 3.9 years ago by mangfu100670
10
gravatar for Csaba Kerepesi
3.9 years ago by
Hungary
Csaba Kerepesi320 wrote:

Quote from the BLAST help (http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html#head4 ):

"The BLAST programs report E-value rather than P-values because it is easier to understand the difference between, for example, E-value of 5 and 10 than P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and E-value are nearly identical."

Important to note that P value of the BLAST is not the same thing than a P-value of a t-test.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Csaba Kerepesi320
1

Could you elaborate further on your last sentence?

ADD REPLYlink written 3.9 years ago by lelle780
1

any p-value is the result of a hypothesis test. since a blast search is not a hypothesis test, a p would be an inappropriate result.

ADD REPLYlink written 3.9 years ago by karl.stamm3.4k
3

Yes, BLAST is doing a hypothesis test: is the sequence a homolog of your query, or not? The null hypothesis is that it is not a homolog, and instead is a "random" sequence. The P-value is the probability that you would've gotten a score this high if it's not a homolog. BLAST scores follow a known distribution (an extreme value distribution) under the null hypothesis. Conceptually, it's the same as any other p-value based significance test.

ADD REPLYlink written 3.9 years ago by seanrobertseddy50

I think most users aren't aware of the hypothesis test as you've stated it. Implicitly, BLAST is testing a query sequence against thousands or thousands of millions of candidate sequences. If we interpret p-value as the false positive rate (or incorrect null-h acceptance), then we should apply a multiple-testing correction to the result, and the copious results are decimated. The chance of artificial alignment is highly dependent upon the genome being searched and the complexity of the query sequence. We can guarantee that a 2-mer is a homolog of a million locations, but it's useless as a result. the E-value distribution accounts for these things and is more directly related to the complexity and uniqueness of a blast 'hit'. It's determined by the genome index being queried.  We use BLAST to find things, and want to know how certain it is. I think most users aren't specifying any hypotheses or accounting for the multiplicity thereof. 

ADD REPLYlink written 3.9 years ago by karl.stamm3.4k
2

seanrobertseddy (Sean Eddy? Hello!) is right here. BLAST is doing a standard hypothesis test. It has an explicit null model and the E-value is estimated based on this model. You may argue whether the null model is appropriate, but math is math. As I remember, BLAST precomputes the two key parameters. FASTA/swat learns the parameters from data. They are less affected by the redundancy in the database.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by lh331k

Exactly: BLAST P-value: "The probability of a chance alignment occurring with a particular score or a better score in a database search." Quoted form BLAST Glossary

More exactly: If you have an n length query sequence and an m length database and running BLAST you get a hit with S score, than the P value is the probability of you get at least one hit with a score greater (or equal) than S if you BLAST a random n length query against a random m length database.

The last state are concluded mostly from: http://www.basiclocalalignmentsearchtool.com/

However P-value is not calculated by BLAST but E-value. P value is not equal with E-value. BLAST E-value is the expectation value of the hits with score greater (or equal) than S if you BLAST a random n length query against a random m length database.

 

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Csaba Kerepesi320

I Agree. The last statement requires further elaboration otherwise it might be misleading. Did  you meant to say that the underlying distribution is different ??

ADD REPLYlink written 3.9 years ago by mxs530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1699 users visited in the last hour