p-value for mutation calls from multiple reads
0
2
Entering edit mode
6.1 years ago

Suppose I have sequenced a haploid genome. There is a position where I suspect I have a point mutation. I have n reads covering this position, with each read "voting" either A, T, C or G as its nucleotide call. The winner of the vote, in this case, is some nucleotide which is different than the reference, ie. the consensus of the reads implies a mutation. The number of reads voting for the winner is q.

The probability that each individual read will correctly call a nucleotide is r. Since the error probabilities are uniform, the probability of each incorrect base is then (1-r)/3=s. So my null hypothesis is that a plurality of reads happened to make such an error so as to produce an incorrect consensus. Given this, what is the p-value for this mutation call? Is it:

  1. Probability of getting q successes after n trials with probability of success s (Binomial distribution), times 3 for each possible erroneous nucleotide
  2. Probability of getting q successes after n trials with probability of success s (Binomial CDF), time 3
  3. Something else?

Also, is my null hypothesis reasonable?

snp • 881 views
ADD COMMENT

Login before adding your answer.

Traffic: 2569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6