Dealing with half missing calls in ".ped" file
1
1
Entering edit mode
5.1 years ago
reza.jabal ▴ 570

I am trying to convert my VCF files to plink format for some downstream processing in PLINK. Since VCF half calls are coded as "0/.", "./1",... PLINK/1.9 raises an error flag for such calls in the .ped file. What would be the easiest way to deal with such calls in the .ped file?

I am aware that half calls can be circumvented from the .map output using --vcf-half-call argument, but is there a similar function for .ped files?

If theres no way to correct half calls in the .ped file, what would be the easiest way to recode half calls as missing calls in the VCF?

software error SNP next-gen PLINK • 7.1k views
1
Entering edit mode

Added a PLINK tag, as this may then be picked up by relevant people monitoring that tag

3
Entering edit mode
5.1 years ago
jean.elbers ★ 1.7k

I would convert the half calls to no calls in the VCF file first. Here's a regular expression in perl that should do the trick perl -pe "s/\d\/\.|\.\/\d/\.\/\./g" original.with.half.calls.vcf > new.with.no.calls.vcf

The code above pretty much translates to "s/find-something/replace-with-something-else/anywhere-find-something-occurs"

find-something in this case = anySingleDigit/. or ./anySingleDigit
replace-with-something = ./.
\d = anySingleDigit
\. = a period
\/ = a /
| = or

1
Entering edit mode

Hi jean, thanks for your suggestion. Have you experienced the same problem with PLINK? It seems like this problem arise only with PLINK version 1.9!

0
Entering edit mode

No I haven't experienced this problem: the variant callers that I have used (GATK, STACKS, FreeBayes, STITCH, callvariants, etc.) won't produce half calls.

0
Entering edit mode

Reza, how did you produce these VCFs?

1
Entering edit mode

Hi Kevin, I used preprocessed VCFs from the African genome variation project. The error occur only when using PLINK 1.9!

0
Entering edit mode

Oh, I see, can you point me to one of these VCFs so that I can do some testing? I also use PLINK 1.9 currently.

0
Entering edit mode

0
Entering edit mode

0
Entering edit mode

Yes Kevin, it needs an MTA agreement. These are not my data, sorry I cannot be more helpful!

0
Entering edit mode

Interesting. I meet this problem today. and I found you cannot recode to vcf if there is half-missing genotypes in the plink file, right? (version 1.9)

0
Entering edit mode

If I understand you correctly, you have a .ped file with half-missing genotypes but cannot convert it to VCF using PLINK v. 1.9? If you have some example data to post, perhaps I could write a regular expression to convert the half-missing genotypes to missing genotypes (not making any promises, but I will try).

0
Entering edit mode

Hi Jean, Thanks. But I think it will be hard to prepare a uniformed script to remove half-missing genotypes. I just do it manually and I will be careful to check these situation and don't accept half-missing phenotype again. THanks. all the same.