My boss wants a depth of coverage vs quality of data (i.e. Q10=100x, Q20 =50, Q30=25, etc...) so I'm not sure how to do this since so much of what we do the answers is "it depends". I need someone to explain it like I'm 5 (also my favorite reddit channel).
From my understanding of phred scores Q10 = 0.9 chance of any base in a read being correct. Which should mean that in a 100bp read if the mean phred score is 10 I could have 10 random bases incorrect in the read. However the odds of any random base being correct in the same place more than one time increase exponentially as well. Which would imply that I could have a few reads covering an area with a mean phred score of 10 and still be able to accurately call a SNP with as little at 3x coverage.
using the following:
P=.9 the probability for being right
Po = (1-P)^n the probability for being wrong, where n=the #of observations
so for 1 observation Po would equal 0.1, 2 obs =.01, 3 obs = .001, etc...
This doesn't seem to jive with the current practices and I'm not sure what I am missing something. Can someone point me to a good reference or explain to me where I am wrong. I would really appreciate it.