Dealing with half missing calls in ".ped" file
1
1
Entering edit mode
6.8 years ago
reza.jabal ▴ 580

I am trying to convert my VCF files to plink format for some downstream processing in PLINK. Since VCF half calls are coded as "0/.", "./1",... PLINK/1.9 raises an error flag for such calls in the .ped file. What would be the easiest way to deal with such calls in the .ped file?

I am aware that half calls can be circumvented from the .map output using --vcf-half-call argument, but is there a similar function for .ped files?

If theres no way to correct half calls in the .ped file, what would be the easiest way to recode half calls as missing calls in the VCF?

software error SNP next-gen PLINK • 9.8k views
ADD COMMENT
1
Entering edit mode

Added a PLINK tag, as this may then be picked up by relevant people monitoring that tag

ADD REPLY
3
Entering edit mode
6.8 years ago
jean.elbers ★ 1.7k

I would convert the half calls to no calls in the VCF file first. Here's a regular expression in perl that should do the trick perl -pe "s/\d\/\.|\.\/\d/\.\/\./g" original.with.half.calls.vcf > new.with.no.calls.vcf

The code above pretty much translates to "s/find-something/replace-with-something-else/anywhere-find-something-occurs"

find-something in this case = anySingleDigit/. or ./anySingleDigit
replace-with-something = ./.
\d = anySingleDigit
\. = a period
\/ = a /
| = or
ADD COMMENT
1
Entering edit mode

Hi jean, thanks for your suggestion. Have you experienced the same problem with PLINK? It seems like this problem arise only with PLINK version 1.9!

ADD REPLY
0
Entering edit mode

No I haven't experienced this problem: the variant callers that I have used (GATK, STACKS, FreeBayes, STITCH, callvariants, etc.) won't produce half calls.

ADD REPLY
0
Entering edit mode

Reza, how did you produce these VCFs?

ADD REPLY
1
Entering edit mode

Hi Kevin, I used preprocessed VCFs from the African genome variation project. The error occur only when using PLINK 1.9!

ADD REPLY
0
Entering edit mode

Oh, I see, can you point me to one of these VCFs so that I can do some testing? I also use PLINK 1.9 currently.

ADD REPLY
0
Entering edit mode

Apologies for late reply Kevin. Please find the link to the data below: https://www.ebi.ac.uk/ega/studies/EGAS00001000238

ADD REPLY
0
Entering edit mode

Oh, do you need access approval in order to download it?

ADD REPLY
0
Entering edit mode

Yes Kevin, it needs an MTA agreement. These are not my data, sorry I cannot be more helpful!

ADD REPLY
0
Entering edit mode

Interesting. I meet this problem today. and I found you cannot recode to vcf if there is half-missing genotypes in the plink file, right? (version 1.9)

ADD REPLY
0
Entering edit mode

If I understand you correctly, you have a .ped file with half-missing genotypes but cannot convert it to VCF using PLINK v. 1.9? If you have some example data to post, perhaps I could write a regular expression to convert the half-missing genotypes to missing genotypes (not making any promises, but I will try).

ADD REPLY
0
Entering edit mode

Hi Jean, Thanks. But I think it will be hard to prepare a uniformed script to remove half-missing genotypes. I just do it manually and I will be careful to check these situation and don't accept half-missing phenotype again. THanks. all the same.

ADD REPLY

Login before adding your answer.

Traffic: 1493 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6