I am working with ancient DNA, and currently doing PCA analysis of my data. Everything seemed fine, till I did positive control with one low coverage sample, and it was placed somewhere completely out of common sense and its known origin.
I digged, that when working with data that has lots of missing SNPs, and generaly is of low coverage, I am supposed at heterozygous sites in my data set (for all individuals) in PED file (plink) to randomly select one of the alleles and make this site homozygous.
I see this point as my main deviation and possible explanation for what I see.
Has anyone already tackled this problem, or am I left with writing my own tool for this?