7.0 years ago by

Bergen, Norway

I think I got it, it is a guess still, please correct me: to achieve lambda, the base error rate has to be multiplied by read length. Then when random errors are modeled as Poisson distributed, e.g. given reads of length 100, this will yield an average error rate lambda of 2 per read. (For the Poisson distribution, lambda denotes the rate of discrete events occurring in an interval or unit volume of space, they are not divided by space size because the space volume is present on 'both sides of the equation' it cancels out).

Given the poisson-distribution and readlength=100, what is the probability of observing 2 or more mismatches (n.mismatches) by chance/ for reasons of sequencing error? This can be generally calculated like this (in R):

```
p = ppois(n.mismatches,lambda=0.02*readlength, lower.tail=FALSE)
ppois(2,lambda=0.02*100, lower.tail=FALSE)
[1] 0.3233236
```

This is acceptable because the probability to see this or a more extreme outcome is > 0.04.
Now, for other values:

```
ppois(4,lambda=0.02*100, lower=F)
[1] 0.05265302 > 0.04
> ppois(5,lambda=2, lower=F)
[1] 0.01656361 < 0.04
```

Thus, 0 to 4 mismatches are ok in one read of length 100, but 5 or more are not. While for a read of length 200, that would be only 7 mismatches, because ppois(8,0.02*200) = 0.02136343 < 0.04.

did you check the docs here: http://bio-bwa.sourceforge.net/bwa.shtml ?

22kYes, but I still don't understand the meaning of the '-n' cutoff! In BLAST, for example, an e-value of 1 means assigned to a hit means that you can get 1 hit with the same score just by chance. What does 0.04 in the -n cutoff mean? What exactly are the "missing alignments"?

1.6kI agree, the documentation is lacking here.

45k