6.6 years ago by
I think I got it, it is a guess still, please correct me: to achieve lambda, the base error rate has to be multiplied by read length. Then when random errors are modeled as Poisson distributed, e.g. given reads of length 100, this will yield an average error rate lambda of 2 per read. (For the Poisson distribution, lambda denotes the rate of discrete events occurring in an interval or unit volume of space, they are not divided by space size because the space volume is present on 'both sides of the equation' it cancels out).
Given the poisson-distribution and readlength=100, what is the probability of observing 2 or more mismatches (n.mismatches) by chance/ for reasons of sequencing error? This can be generally calculated like this (in R):
p = ppois(n.mismatches,lambda=0.02*readlength, lower.tail=FALSE)
This is acceptable because the probability to see this or a more extreme outcome is > 0.04.
Now, for other values:
 0.05265302 > 0.04
> ppois(5,lambda=2, lower=F)
 0.01656361 < 0.04
Thus, 0 to 4 mismatches are ok in one read of length 100, but 5 or more are not. While for a read of length 200, that would be only 7 mismatches, because ppois(8,0.02*200) = 0.02136343 < 0.04.