The 20100804 data has missing genotypes due to the way it was created
The set itself was a naive 2 of 4 intersection of 4 input call sets, only 2 of these 4 sets had genotypes associated with them, Broad and UMich, the Broad genotype set was phased and used LD info so was felt to be better so any snp with a Broad genotype got that genotype info, a snp with just a UMich genotype got that info, any snp only called by the NCBI and Boston College didn't get any genotype
We have just released a new data set which has much more complete phased genotypes for a larger number of individuals but we don't have population level allele frequencies yet
Hi, there I kind of have the same question. I have downloaded data for a particular gene - there are huge amounts of missing data!
The missing data is not for individual sites (ie everyone is missing data for a SNP). The missing data is totally haphazard. Eg SNP 1 has data for african americans and Europeans, whilst SNP 2 has data for Yoruba, Asians and lacks african americans.