9.1 years ago by
United States
The major reason is that the venter genome is sequenced to ~9X coverage. There is a high chance that you miss an allele due to sampling fluctuation. You cannot get a good het:hom ratio from huref. In addition, the reference genome has higher indel sequencing error rate than substitution error rate. This makes het:hom of indels lower than that of snps, even if your indel calling is perfect.
The venter reads have a particularly higher 1bp insertion error rate. It is not a good idea to learn indel statistics from huref in general, though this should not explain a low het:hom ratio.
If the sample is not admixed and come from the same population of the reference genome, the theoretical expectation is het:hom=2:1. The derivation is very simple (see the maq paper). However, the reference genome is a hybrid. The het:hom is always lower than 2.
Also, you observe homozygous variants mostly due to coalescence, not due to recurrent mutations.
•
link
modified 9.1 years ago
•
written
9.1 years ago by
lh3 ♦ 32k