Question: What Is The Typical Density Of Heterozygous Alleles In A Diploid Human Genome?
gravatar for Ryan Thompson
9.7 years ago by
Ryan Thompson3.4k
TSRI, La Jolla, CA
Ryan Thompson3.4k wrote:

I'm doing a theoretical project on haplotype assembly, and I need a reasonable method for generating a realistic diploid stretch of DNA. For example, if the density of heterozygous alleles is approximately one every thousand bases, then I would generate a diploid version by randomly mutating every position with probability 1/1000.

So in a typical human genome, what is the density of heterozygous alleles? Or in other words, what is the mean number of base pairs between heterozygous alleles?

haplotype genomics • 4.4k views
ADD COMMENTlink modified 9.3 years ago by lh332k • written 9.7 years ago by Ryan Thompson3.4k
gravatar for lh3
9.7 years ago by
United States
lh332k wrote:

It is 1e-3 for Africans and ~0.8e-3 for typical non-Africans. If a population went through a severe bottleneck recently, the heterozygosity is smaller. Uniformly generating heterozygotes in a random way is good enough for some purposes, but if you want a more realistic simulation, you should use a coalescent simulator, such as ms or MaCS. In a coalescent process, which approximates human evolution quite well, the distance between adjacent heterozygotes follows a distribution much fatter than a poisson distribution. As a result, you more frequently get a region with a lot of heterozygotes or a very long stretch of homozygosity.

Another way is to take the consensus sequence from a published personal genome, which is more realistic to some extend, but also has more bias.

ADD COMMENTlink written 9.7 years ago by lh332k

Thanks, Heng. I went for simple in my answer, but clearly there is much more detail in "doing this right".

ADD REPLYlink written 9.7 years ago by Sean Davis26k
gravatar for Sean Davis
9.7 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

You could generate intervals between variants using a poisson distribution. You could use sampling from known snp allele frequencies for genotype probabilities, with the knowledge that "known" snps are skewed toward common snps.

ADD COMMENTlink written 9.7 years ago by Sean Davis26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 956 users visited in the last hour