I'm interested in comparing the genotypes from Genome in a Bottle's NA12878 (GIAB) to those of her parents (NA12891 and NA12892).
I downloaded GIAB's NA12878 vcf from here:
After a lot of searching around, I found this page from Broad describing a vcf containing the genotypes for the trio:
And I downloaded the variants for NA12891 and NA12892 from here:
In total, there are ~3.3 million variants in the GIAB vcf. I compared the alternate alleles and genotypes in the GIAB vcf with the corresponding values in her parents and found that ~27% of the positions had parental genotypes that didn't make sense.
e.g. a position in the daughter is genotyped as 1/1, but the father is 0/1 and the mother is 0/0. That is, it's impossible for the daughter to be 1/1 if her parents are 0/1 and 0/0.
I'm aware that the GIAB vcf has gone through a lot more curation than those of her parents, so perhaps that accounts for the discrepancy?
I'm pretty sure I'm using the correct files, but if anyone thinks otherwise, please let me know.