Making sense of genotype calculation equations
0
0
Entering edit mode
2.5 years ago

I am struggling understand the first two of the following equations, which are from the Supplementary materials of the Conpair paper, and describe how genotypes are calculated: Conpair equations

For the first equation, the probability of D|AA is much lower if all my reads were A compared to if all my reads were B? Is e_j what I think it means? I would think a low e_j means that the call is more reliable. Ex: my error rate is .01 and D={A,A,A,A}:

P(D|AA) = (.01^4)(.99^0)=1e-8

But if D={B,B,B,B}, the calculation comes to:

P(D|AA) = (.01^0)(.99^4) ~ .96

For the second equation, the occurrences of A are not considered at all? The index and upper bound for both operators are exactly the same, if I'm reading that right? Is it just me or are there a bunch of typos here?

Citation: Bergmann EA, Chen BJ, Arora K, Vacic V, Zody MC. Conpair: concordance and contamination estimator for matched tumor-normal pairs. Bioinformatics. 2016 Oct 15;32(20):3196-3198. doi: 10.1093/bioinformatics/btw389. Epub 2016 Jun 26. PMID: 27354699; PMCID: PMC5048070.

modeling concordance conpair • 771 views
ADD COMMENT
1
Entering edit mode

Are you suggesting reviewers are doing a lousy job? SHOCKING!

More seriously, if you go to Heng's note (p20), yes they got it wrong. They mislabeled AA and BB compared to Heng's 0 and m (plus other typos).

http://lh3lh3.users.sourceforge.net/download/samtools.pdf

ADD REPLY
0
Entering edit mode

I thought only I was allowed to be lousy!

Thanks for the reference, so far the equations make more sense.

ADD REPLY

Login before adding your answer.

Traffic: 2817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6