Sex inconsistency in 1000 genomes phase 3 TSI samples
1
0
Entering edit mode
5.0 years ago
Apprentice ▴ 160

Hi.

I'm going to check sex consistency using 1000 genomes phase 3 TSI samples. As a result, inconsistency in sex was detected in 3 samples. The IDs were NA20506, NA20530, and NA20533. So, I would like to know that this inconsistency was often know? Should I removed the samples in chrX SNP analysis?

To check the sex, I did following process.

I got a vcf file of chrX from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. The file name was ALL.chrX.phase3_shapeit2_mvncall_integrated_v1b.20130502.genotypes.vcf.gz.

First, the file was converted to bed file format using PLINK 1.9. Second, SNPs in the file were splited into chrX SNPs and chrXY SNPs using plink --split-x. Third, EUR samples were extracted from the file using plink --keep. Next, SNPs with MAF<0.01, HWE P < 1e-6, or CR<0.98 were excluded. Finally, I checked sex based on the file using PLINK --check-sex.

SNP • 1.7k views
ADD COMMENT
2
Entering edit mode
5.0 years ago

No. You need to reselect your --check-sex thresholds based on the data. As long as there is a clean separation between females and males, you're fine; it's fine if some female F values are much higher than the default 0.2 lower threshold.

(The 0.2 and 0.8 default thresholds will be eliminated in the future; they're only present at all in PLINK 1.9 to preserve backward compatibility.)

ADD COMMENT
0
Entering edit mode

Thank you for your comment.

I got a result of --check-sex as below


 FID       IID       PEDSEX       SNPSEX       STATUS            F

NA20506 NA20506 2 1 PROBLEM 0.894

NA20530 NA20530 2 1 PROBLEM 0.863

NA20533 NA20533 2 0 PROBLEM 0.4936


I think that this result indicates clear sex inconsistency regardless of threshold value. Could you give me any adivice?

ADD REPLY
1
Entering edit mode

See https://www.cog-genomics.org/plink/1.9/basic_stats#check_sex . The note on LD pruning is especially likely to be relevant here.

As long as you end up with a gap between the highest female and the lowest male, you probably have NO sex errors.

ADD REPLY
0
Entering edit mode

Thank you for your comment.

Of course, I already did LD pruning.

ADD REPLY
0
Entering edit mode

There seems to be a language barrier; you clearly do not fully understand my 3-sentence answers or the official documentation, otherwise you would have at least said something about the lowest male F-statistic in your dataset, or the lack of LD pruning in your actual list of steps.

You should try to find a more experienced analyst who understands your first language to talk to.

ADD REPLY
0
Entering edit mode

Thank you for your advice. I'll look for an experts, who can understand my language.

In my result of --check-sex, the lowest F in male samples is 1.00. the biggest F in female samples are 0.08218, 0.4936, 0.863, 0.894.

I appreciate it if you notice anything about it and would advice me.

ADD REPLY

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6