Hello, I am working with the vcf files from the 1000G project high-coverage (30x) release.

I do not completely understand how have the authors handled the genotypes of male individuals in the non-pseudoautosomal chrX regions. The genotypes in the original file downloaded from here are codified as usual with 0|0, 1|1, 0|1 and 1|0, also in chrX. However, if I subset the male individuals and the chrX I find heterozygous annotated SNPs, so 0|1 or 1|1 where I would expect always homozygous males, considering they are haploid.

In the methods section of their paper, they say: SHAPEIT4 produced diploid output across the entire chromosome X in all samples. To ensure proper ploidy of male samples in the phased panel, we converted “0|1”, “1|0”, and “1|1” GTs into a haploid representation (i.e. “1”) in non-PAR regions of chromosome X in males.

But I see everything as diploid and not always homozygous males... Do you know what is happening?

