I have SNP data on 64 samples from my population of interest (~330,000 SNPs per sample using the HumanCNV370-Quad).
I sorted and filtered the published Altai Neanderthal and Denisovan VCF files (http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/VCF/ and http://cdna.eva.mpg.de/denisova/VCF/hg19_1000g/) down to only the rs#s found on my SNP data.
I then noticed a problem where in well over half of the SNPs for the Neanderthal VCF and a small percentage for the Denisovan VCF that the alternative base is not listed... When I go look up those SNPs in dbSNP or in the Denisovan VCF file, alternate alleles exist and are listed... Luckily, it seems that whenever the alt allele is not listed, they are always homozygous for the ref allele
Since these are ancient DNA calls, I will have filter out some types of substitutions, but I can't do that if the. I was wondering, how do I fix this?
Also, I plan on using vcf-isec to intersect the two files, I was wondering, how will the incongruous alt allele information affect this? Thanks!