Using my very basic probability skills, I will try to explain dsull comments with a little more detail.

You can calculate pretty easily the probability of the sequence being correct, i.e., no base is wrong, as (probability the base is correct) ^ #bases. So, for example, for a 100bp sequence with all bases Q20, the probability of the sequence being correct is `(0.99)^(100) = 0.366`

, or 36.6% chance having no errors.

The probability of a sequence "being wrong" is one minus the probability of the sequence being correct. So, for the same example above of a 100bp sequence with all bases Q20, the probability of the sequence being wrong (i.e., containing one or more errors) is `1-0.366 = 0.634`

, or 63.4% chance of containing at least one error.

Note there is only one way of a sequence is correct (all bases must be correct), but there are many ways a sequence can be wrong - one base can be wrong, two bases, and so on. The estimation from your question - `(0.01)^100`

- is actually the probability of **all** bases of the sequence being wrong.

No, let's assume that the sequencer has a 1% chance of error. So if 100 nucleotides are sequenced, it is expected that one of them will have an error.

You'd have to do 1 - (0.99^100) to calculate your desired probability, not (0.01)^100

Going by this, it would mean a 250 bp read would have 1-((0.99)^250) ~ 92% chance of being wrong if all bases in the sequence had a

`PHRED`

score of 20? Or did you mean to say this is the probability of the sequence being not wrong?I suppose I am misunderstanding something here.

What do you consider "wrong"? My definition of wrong is if a sequence differs from the ground truth even by one base.

So yes, your 250 bp read would have a 92% chance of being wrong (i.e. differing from the true sequence by at least one base).

A 1% error rate is pretty high (it's 1 in 100 bases basically; and now you're giving me a read with 250 bases, so of course it will most likely contain an error).

I was actually tripping over precisely what @h.mon explained in their answer:

"Note there is only one way of a sequence is correct (all bases must be correct), but there are many ways a sequence can be wrong".Now your answer makes sense to me!!

Phred score is for each individual basecall. I don't think it can be extended to a stretch of sequence (just intuition not a statistician).

^^^ yes, these are usually quoted at the level of an individual base.

Phred scores are also calculated for the alignment of a read to a reference though, i.e., MAPQ score.