Ancient human DNA sequencing is challenging because of low coverage. The degraded samples many times contain more exogenous material from microbial and human contamination than endogenous ancient human DNA. Also, deamination causes C -->T, and G-->A changes.
Thus most of the time single library protocol is used, and in post-sequencing processing where a single allele is randomly drawn from the two alleles in the individual to represent the individual at that site. An allele call at each target SNP is made using majority rule over all sequences overlapping the SNP. When each of the possible alleles is supported by an equal number of sequences, an allele at random is picked. An allele is set to “no call” for SNPs at which there is no read coverage.
Downstream this in effect produces a haploid human sequence.
The problem is that In IBD comparisons with modern humans, the genetic distance between the ancient human and moderns is artificially increased because the ancient is a haploid.
If diploid calls are desired instead of haploid calls for the ancient sequence. What is the best way to achieve this:
1- Make diploid calls as one would do with double-stranded library preparation methods, and then subsequently filter the VCF file for positions with low confidence calls, OR
2- Make haploid calls, and then later use something like IMPUTE OR BEAGLE to impute one strand of the haploid to create a diploid sequence.
This reads like an assignment question. Is it one, dilawerkh4 ?
........................
Yes? What does that mean?
I am inclined to think 1 will not work if only one strand is sequenced, but maybe I'm not seeing something, and someone who has done this before can point out why it may work.
This is a comment, not an answer.
I wonder if the protocol used in Jurassic Park was ever published; it seemed pretty effective.