Question: Dealing with half missing calls in ".ped" file
0
gravatar for reza.jabal
2.4 years ago by
reza.jabal370
New York, USA
reza.jabal370 wrote:

I am trying to convert my VCF files to plink format for some downstream processing in PLINK. Since VCF half calls are coded as "0/.", "./1",... PLINK/1.9 raises an error flag for such calls in the .ped file. What would be the easiest way to deal with such calls in the .ped file?

I am aware that half calls can be circumvented from the .map output using --vcf-half-call argument, but is there a similar function for .ped files?

If theres no way to correct half calls in the .ped file, what would be the easiest way to recode half calls as missing calls in the VCF?

snp plink next-gen software error • 3.1k views
ADD COMMENTlink modified 2.4 years ago by jean.elbers1.4k • written 2.4 years ago by reza.jabal370
1

Added a PLINK tag, as this may then be picked up by relevant people monitoring that tag

ADD REPLYlink written 2.4 years ago by Kevin Blighe61k
2
gravatar for jean.elbers
2.4 years ago by
jean.elbers1.4k
jean.elbers1.4k wrote:

I would convert the half calls to no calls in the VCF file first. Here's a regular expression in perl that should do the trick perl -pe "s/\d\/\.|\.\/\d/\.\/\./g" original.with.half.calls.vcf > new.with.no.calls.vcf

The code above pretty much translates to "s/find-something/replace-with-something-else/anywhere-find-something-occurs"

find-something in this case = anySingleDigit/. or ./anySingleDigit
replace-with-something = ./.
\d = anySingleDigit
\. = a period
\/ = a /
| = or
ADD COMMENTlink written 2.4 years ago by jean.elbers1.4k
1

Hi jean, thanks for your suggestion. Have you experienced the same problem with PLINK? It seems like this problem arise only with PLINK version 1.9!

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by reza.jabal370

No I haven't experienced this problem: the variant callers that I have used (GATK, STACKS, FreeBayes, STITCH, callvariants, etc.) won't produce half calls.

ADD REPLYlink written 2.4 years ago by jean.elbers1.4k

Reza, how did you produce these VCFs?

ADD REPLYlink written 2.4 years ago by Kevin Blighe61k
1

Hi Kevin, I used preprocessed VCFs from the African genome variation project. The error occur only when using PLINK 1.9!

ADD REPLYlink written 2.4 years ago by reza.jabal370

Oh, I see, can you point me to one of these VCFs so that I can do some testing? I also use PLINK 1.9 currently.

ADD REPLYlink written 2.4 years ago by Kevin Blighe61k

Apologies for late reply Kevin. Please find the link to the data below: https://www.ebi.ac.uk/ega/studies/EGAS00001000238

ADD REPLYlink written 2.4 years ago by reza.jabal370

Oh, do you need access approval in order to download it?

ADD REPLYlink written 2.4 years ago by Kevin Blighe61k

Yes Kevin, it needs an MTA agreement. These are not my data, sorry I cannot be more helpful!

ADD REPLYlink written 2.4 years ago by reza.jabal370

Interesting. I meet this problem today. and I found you cannot recode to vcf if there is half-missing genotypes in the plink file, right? (version 1.9)

ADD REPLYlink written 20 months ago by Shicheng Guo8.3k

If I understand you correctly, you have a .ped file with half-missing genotypes but cannot convert it to VCF using PLINK v. 1.9? If you have some example data to post, perhaps I could write a regular expression to convert the half-missing genotypes to missing genotypes (not making any promises, but I will try).

ADD REPLYlink written 20 months ago by jean.elbers1.4k

Hi Jean, Thanks. But I think it will be hard to prepare a uniformed script to remove half-missing genotypes. I just do it manually and I will be careful to check these situation and don't accept half-missing phenotype again. THanks. all the same.

ADD REPLYlink written 20 months ago by Shicheng Guo8.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 775 users visited in the last hour