Sex inconsistency in 1000 genomes phase 3 TSI samples
1
0
Entering edit mode
2.6 years ago
Apprentice ▴ 90

Hi.

I'm going to check sex consistency using 1000 genomes phase 3 TSI samples. As a result, inconsistency in sex was detected in 3 samples. The IDs were NA20506, NA20530, and NA20533. So, I would like to know that this inconsistency was often know? Should I removed the samples in chrX SNP analysis?

To check the sex, I did following process.

I got a vcf file of chrX from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. The file name was ALL.chrX.phase3_shapeit2_mvncall_integrated_v1b.20130502.genotypes.vcf.gz.

First, the file was converted to bed file format using PLINK 1.9. Second, SNPs in the file were splited into chrX SNPs and chrXY SNPs using plink --split-x. Third, EUR samples were extracted from the file using plink --keep. Next, SNPs with MAF<0.01, HWE P < 1e-6, or CR<0.98 were excluded. Finally, I checked sex based on the file using PLINK --check-sex.

SNP • 1.0k views
2
Entering edit mode
2.6 years ago

No. You need to reselect your --check-sex thresholds based on the data. As long as there is a clean separation between females and males, you're fine; it's fine if some female F values are much higher than the default 0.2 lower threshold.

(The 0.2 and 0.8 default thresholds will be eliminated in the future; they're only present at all in PLINK 1.9 to preserve backward compatibility.)

0
Entering edit mode

I got a result of --check-sex as below

 FID       IID       PEDSEX       SNPSEX       STATUS            F


NA20506 NA20506 2 1 PROBLEM 0.894

NA20530 NA20530 2 1 PROBLEM 0.863

NA20533 NA20533 2 0 PROBLEM 0.4936

I think that this result indicates clear sex inconsistency regardless of threshold value. Could you give me any adivice?

1
Entering edit mode

See https://www.cog-genomics.org/plink/1.9/basic_stats#check_sex . The note on LD pruning is especially likely to be relevant here.

As long as you end up with a gap between the highest female and the lowest male, you probably have NO sex errors.

0
Entering edit mode

Of course, I already did LD pruning.

0
Entering edit mode

There seems to be a language barrier; you clearly do not fully understand my 3-sentence answers or the official documentation, otherwise you would have at least said something about the lowest male F-statistic in your dataset, or the lack of LD pruning in your actual list of steps.

You should try to find a more experienced analyst who understands your first language to talk to.

0
Entering edit mode

Thank you for your advice. I'll look for an experts, who can understand my language.

In my result of --check-sex, the lowest F in male samples is 1.00. the biggest F in female samples are 0.08218, 0.4936, 0.863, 0.894.

I appreciate it if you notice anything about it and would advice me.