Question

p-value for mutation calls from multiple reads

2

Entering edit mode

6.1 years ago

robertobfisher ▴ 30

Suppose I have sequenced a haploid genome. There is a position where I suspect I have a point mutation. I have n reads covering this position, with each read "voting" either A, T, C or G as its nucleotide call. The winner of the vote, in this case, is some nucleotide which is different than the reference, ie. the consensus of the reads implies a mutation. The number of reads voting for the winner is q.

The probability that each individual read will correctly call a nucleotide is r. Since the error probabilities are uniform, the probability of each incorrect base is then (1-r)/3=s. So my null hypothesis is that a plurality of reads happened to make such an error so as to produce an incorrect consensus. Given this, what is the p-value for this mutation call? Is it:

Probability of getting q successes after n trials with probability of success s (Binomial distribution), times 3 for each possible erroneous nucleotide
Probability of getting q successes after n trials with probability of success s (Binomial CDF), time 3
Something else?

Also, is my null hypothesis reasonable?

snp • 881 views

ADD COMMENT • link updated 6.0 years ago by Biostar 20 • written 6.1 years ago by robertobfisher ▴ 30