Does anyone know a reference for the average number of homozygous calls in VCF files generated from whole-genome sequencing?
Does anyone know a reference for the average number of homozygous calls in VCF files generated from whole-genome sequencing?
total back of the napkin: average person has 3 million variants
most of those will be heterozygous, like maybe 1/200 (p^2 vs 2pq) of those 3M will be homozygous alt. (15,000)
Compared to GRCh38, the average sample I encounter has ~4M SNVs, ~900k small indels (<50bp), ~25k SVs (indels >49bp, duplications, etc). I've seen the het/hom ratio vary by type, but I'd expect something between 1:1 and 2:1 het:hom. If forced to narrow it down, I'd say 1.2 to 1.6.
After this conversation, I tried to calculate a formula for the number of homozygous calls in the VCF from WES, considering only the autosomal subset. I found a fancy one, with hypergeometric functions, which gives 41,701 calls, while the actual calls in one exome I considered are 45,714. About 37% of the calls.
I am still wondering about the average percentage in WGS. I found a paper on the Swedish population which indicates 1 486 648 hom calls and 2 366 095 het calls for the average whole-genome of a Swedish individual (Table 1, of this paper). Again a similar proportion: 38%.
here are all the chr22 calls for HG00096. the 0,0 are gVCF reference ranges of variable length so you should ignore that.
(0, 0) 7499161
(0, 1) 49964
(0, 2) 452
(0, 3) 47
(0, 4) 17
(0, 5) 3
(0, 6) 1
(0, 7) 1
(1, 1) 29293
(1, 2) 1493
(1, 3) 69
(1, 4) 17
(1, 5) 7
(1, 6) 3
(2, 2) 171
(2, 3) 48
(2, 4) 5
(2, 5) 4
(2, 7) 1
(3, 3) 5
(3, 4) 8
(3, 5) 1
(3, 6) 1
(4, 4) 3
(4, 5) 3
(6, 6) 2
(6, 7) 1
(7, 7) 2
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Number of homozygous calls would be affected by things like IBS and IBD. You should give a better context for such question.