Error in PLINK --mind option? ("All people removed due to missing genotype data")
1
0
Entering edit mode
3.8 years ago

Hi!

I just wanted to start with the QC of my data and noticed something very peculiar with the --mind option. For simplicity i first looked only at the very first sample and produced a VCF file. It is fine, with entries in most of the rows. Then i added "--mind 0.01" and it excluded that Sample with the note:

Error: All people removed due to missing genotype data (--mind).

How can that be? How can genotype data be missing if it is clearly possible to produce a VCF file from it?

Here the first call's output:

>Options in effect:
> --bed ukb_cal_chr1_v2.bed
>  --bim ukb_snp_chr1_v2.bim
>  --fam ukb49398_cal_ALL_v2_s488264_fam/ukb49398_cal_chr1_v2_s488264.fam
>  --keep dummy_keep
>  --maf 0.005
>  --recode vcf
>
>32091 MB RAM detected; reserving 16045 MB for main workspace.
>63487 variants loaded from .bim file.
>488377 people (223467 males, 264797 females, 113 ambiguous) loaded from .fam.
>Ambiguous sex IDs written to plink.nosex .
>--keep: 1 person remaining.
>Before main variant filters, 1 founder and 0 nonfounders present.
>Calculating allele frequencies... done.
>Total genotyping rate in remaining samples is 0.976326.
>52130 variants removed due to minor allele threshold(s)
>(--maf/--max-maf/--mac/--max-mac).
>11357 variants and 1 person pass filters and QC.
>Note: No phenotypes present.
>--recode vcf to plink.vcf ... done.

And the output with --mind in effect:

>Options in effect:
  --bed ukb_cal_chr1_v2.bed
  --bim ukb_snp_chr1_v2.bim
  --fam ukb49398_cal_ALL_v2_s488264_fam/ukb49398_cal_chr1_v2_s488264.fam
  --keep dummy_keep
  --maf 0.005
  --mind 0.01
  --recode vcf

>32091 MB RAM detected; reserving 16045 MB for main workspace.
>63487 variants loaded from .bim file.
>488377 people (223467 males, 264797 females, 113 ambiguous) loaded from .fam.
>Ambiguous sex IDs written to plink.nosex .
>--keep: 1 person remaining.
>Error: All people removed due to missing genotype data (--mind).
>IDs written to plink.irem .

The problem occurs with PLINK 1.9 and PLINK 2 with slightly different output messages.

Can someone explain this to me?

Thanks in advance!

SNP PLINK • 4.0k views
ADD COMMENT
1
Entering edit mode
3.8 years ago
zx8754 11k

Total genotyping rate in remaining samples is 0.976326.

It says for that one individual "genotyping rate is 0.97.", meaning missingness is 3%(1-0.97=0.03), so when we apply --mind 0.01, we are saying remove sample with missingness more than 1%, so that one sample gets dropped. Makes sense?

ADD COMMENT
0
Entering edit mode

Ah, thanks for the clarification! I thought that the 0.01 meant the minimum genotyping rate, not the maximum missingness. But yeah, it makes much more sense that way!

ADD REPLY

Login before adding your answer.

Traffic: 2086 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6