Average number of homozygous SNVs in WGS
17 months ago
paolo ▴ 70

Does anyone know a reference for the average number of homozygous calls in VCF files generated from whole-genome sequencing?

WGS homozygous • 1.4k views
Number of homozygous calls would be affected by things like IBS and IBD. You should give a better context for such question.

17 months ago

total back of the napkin: average person has 3 million variants

most of those will be heterozygous, like maybe 1/200 (p^2 vs 2pq) of those 3M will be homozygous alt. (15,000)

Compared to GRCh38, the average sample I encounter has ~4M SNVs, ~900k small indels (<50bp), ~25k SVs (indels >49bp, duplications, etc). I've seen the het/hom ratio vary by type, but I'd expect something between 1:1 and 2:1 het:hom. If forced to narrow it down, I'd say 1.2 to 1.6.

boy was i off. i thought the average variants would have a frequency of 0.05 rather than 0.5

17 months ago
paolo ▴ 70

After this conversation, I tried to calculate a formula for the number of homozygous calls in the VCF from WES, considering only the autosomal subset. I found a fancy one, with hypergeometric functions, which gives 41,701 calls, while the actual calls in one exome I considered are 45,714. About 37% of the calls.

I am still wondering about the average percentage in WGS. I found a paper on the Swedish population which indicates 1 486 648 hom calls and 2 366 095 het calls for the average whole-genome of a Swedish individual (Table 1, of this paper). Again a similar proportion: 38%.

here are all the chr22 calls for HG00096. the 0,0 are gVCF reference ranges of variable length so you should ignore that.

(0, 0)    7499161
(0, 1)      49964
(0, 2)        452
(0, 3)         47
(0, 4)         17
(0, 5)          3
(0, 6)          1
(0, 7)          1
(1, 1)      29293
(1, 2)       1493
(1, 3)         69
(1, 4)         17
(1, 5)          7
(1, 6)          3
(2, 2)        171
(2, 3)         48
(2, 4)          5
(2, 5)          4
(2, 7)          1
(3, 3)          5
(3, 4)          8
(3, 5)          1
(3, 6)          1
(4, 4)          3
(4, 5)          3
(6, 6)          2
(6, 7)          1
(7, 7)          2


