I'm going to check sex consistency using 1000 genomes phase 3 TSI samples. As a result, inconsistency in sex was detected in 3 samples. The IDs were NA20506, NA20530, and NA20533. So, I would like to know that this inconsistency was often know? Should I removed the samples in chrX SNP analysis?
To check the sex, I did following process.
I got a vcf file of chrX from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. The file name was ALL.chrX.phase3_shapeit2_mvncall_integrated_v1b.20130502.genotypes.vcf.gz.
First, the file was converted to bed file format using PLINK 1.9. Second, SNPs in the file were splited into chrX SNPs and chrXY SNPs using plink --split-x. Third, EUR samples were extracted from the file using plink --keep. Next, SNPs with MAF<0.01, HWE P < 1e-6, or CR<0.98 were excluded. Finally, I checked sex based on the file using PLINK --check-sex.