Hi!
I just wanted to start with the QC of my data and noticed something very peculiar with the --mind option. For simplicity i first looked only at the very first sample and produced a VCF file. It is fine, with entries in most of the rows. Then i added "--mind 0.01" and it excluded that Sample with the note:
Error: All people removed due to missing genotype data (--mind).
How can that be? How can genotype data be missing if it is clearly possible to produce a VCF file from it?
Here the first call's output:
>Options in effect:
> --bed ukb_cal_chr1_v2.bed
> --bim ukb_snp_chr1_v2.bim
> --fam ukb49398_cal_ALL_v2_s488264_fam/ukb49398_cal_chr1_v2_s488264.fam
> --keep dummy_keep
> --maf 0.005
> --recode vcf
>
>32091 MB RAM detected; reserving 16045 MB for main workspace.
>63487 variants loaded from .bim file.
>488377 people (223467 males, 264797 females, 113 ambiguous) loaded from .fam.
>Ambiguous sex IDs written to plink.nosex .
>--keep: 1 person remaining.
>Before main variant filters, 1 founder and 0 nonfounders present.
>Calculating allele frequencies... done.
>Total genotyping rate in remaining samples is 0.976326.
>52130 variants removed due to minor allele threshold(s)
>(--maf/--max-maf/--mac/--max-mac).
>11357 variants and 1 person pass filters and QC.
>Note: No phenotypes present.
>--recode vcf to plink.vcf ... done.
And the output with --mind in effect:
>Options in effect:
--bed ukb_cal_chr1_v2.bed
--bim ukb_snp_chr1_v2.bim
--fam ukb49398_cal_ALL_v2_s488264_fam/ukb49398_cal_chr1_v2_s488264.fam
--keep dummy_keep
--maf 0.005
--mind 0.01
--recode vcf
>32091 MB RAM detected; reserving 16045 MB for main workspace.
>63487 variants loaded from .bim file.
>488377 people (223467 males, 264797 females, 113 ambiguous) loaded from .fam.
>Ambiguous sex IDs written to plink.nosex .
>--keep: 1 person remaining.
>Error: All people removed due to missing genotype data (--mind).
>IDs written to plink.irem .
The problem occurs with PLINK 1.9 and PLINK 2 with slightly different output messages.
Can someone explain this to me?
Thanks in advance!
Ah, thanks for the clarification! I thought that the 0.01 meant the minimum genotyping rate, not the maximum missingness. But yeah, it makes much more sense that way!