Question: Sex inconsistency in 1000 genomes phase 3 TSI samples
0
gravatar for Apprentice
4 months ago by
Apprentice30
Apprentice30 wrote:

Hi.

I'm going to check sex consistency using 1000 genomes phase 3 TSI samples. As a result, inconsistency in sex was detected in 3 samples. The IDs were NA20506, NA20530, and NA20533. So, I would like to know that this inconsistency was often know? Should I removed the samples in chrX SNP analysis?

To check the sex, I did following process.

I got a vcf file of chrX from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. The file name was ALL.chrX.phase3_shapeit2_mvncall_integrated_v1b.20130502.genotypes.vcf.gz.

First, the file was converted to bed file format using PLINK 1.9. Second, SNPs in the file were splited into chrX SNPs and chrXY SNPs using plink --split-x. Third, EUR samples were extracted from the file using plink --keep. Next, SNPs with MAF<0.01, HWE P < 1e-6, or CR<0.98 were excluded. Finally, I checked sex based on the file using PLINK --check-sex.

snp • 231 views
ADD COMMENTlink modified 4 months ago by chrchang5235.5k • written 4 months ago by Apprentice30
2
gravatar for chrchang523
4 months ago by
chrchang5235.5k
United States
chrchang5235.5k wrote:

No. You need to reselect your --check-sex thresholds based on the data. As long as there is a clean separation between females and males, you're fine; it's fine if some female F values are much higher than the default 0.2 lower threshold.

(The 0.2 and 0.8 default thresholds will be eliminated in the future; they're only present at all in PLINK 1.9 to preserve backward compatibility.)

ADD COMMENTlink written 4 months ago by chrchang5235.5k

Thank you for your comment.

I got a result of --check-sex as below


 FID       IID       PEDSEX       SNPSEX       STATUS            F

NA20506 NA20506 2 1 PROBLEM 0.894

NA20530 NA20530 2 1 PROBLEM 0.863

NA20533 NA20533 2 0 PROBLEM 0.4936


I think that this result indicates clear sex inconsistency regardless of threshold value. Could you give me any adivice?

ADD REPLYlink modified 3 months ago • written 3 months ago by Apprentice30
1

See https://www.cog-genomics.org/plink/1.9/basic_stats#check_sex . The note on LD pruning is especially likely to be relevant here.

As long as you end up with a gap between the highest female and the lowest male, you probably have NO sex errors.

ADD REPLYlink written 3 months ago by chrchang5235.5k

Thank you for your comment.

Of course, I already did LD pruning.

ADD REPLYlink written 3 months ago by Apprentice30

There seems to be a language barrier; you clearly do not fully understand my 3-sentence answers or the official documentation, otherwise you would have at least said something about the lowest male F-statistic in your dataset, or the lack of LD pruning in your actual list of steps.

You should try to find a more experienced analyst who understands your first language to talk to.

ADD REPLYlink written 3 months ago by chrchang5235.5k

Thank you for your advice. I'll look for an experts, who can understand my language.

In my result of --check-sex, the lowest F in male samples is 1.00. the biggest F in female samples are 0.08218, 0.4936, 0.863, 0.894.

I appreciate it if you notice anything about it and would advice me.

ADD REPLYlink modified 3 months ago • written 3 months ago by Apprentice30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 552 users visited in the last hour