Problem to convert genotypes to plink format from bcftools
0
0
Entering edit mode
4 months ago
Alejandro ▴ 10

I had a vcf file with imputed genotypes from beagle like this:

##fileformat=VCFv4.2
##filedate=20210504
##source="beagle.18May20.d20.jar"
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated ALT Allele Frequencies">
##INFO=<ID=DR2,Number=1,Type=Float,Description="Dosage R-Squared: estimated squared correlation between estimated REF dose [P(RA) + 2*P(RR)] and true REF dose">
##INFO=<ID=IMP,Number=0,Type=Flag,Description="Imputed marker">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=A,Type=Float,Description="estimated ALT dose [P(RA) + 2*P(AA)]">
##FORMAT=<ID=GP,Number=G,Type=Float,Description="Estimated Genotype Probability">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  AA20098 AA20099 AC10002
1       20509   oar3_OAR1_17218 A       G       .       PASS    DR2=0.16;AF=0.2293;IMP  GT:DS:GP        0|0:0.28:0.90,0.05,0.05 0|0:0.27:0.99,0
.01,0.01        0|0:0.71:0.70,0.20,0.10
1       21717   oar3_OAR1_18426 A       G       .       PASS    DR2=0.01;AF=0.0484;IMP  GT:DS:GP        0|1:0.12:0.05,0.90,0.05 0|1:0.11:0.01,0
.99,0.01        0|1:0.06:0.20,0.70,0.10
1       23949   oar3_OAR1_20658 A       C       .       PASS    DR2=0.17;AF=0.3023;IMP  GT:DS:GP        1|0:0.41:0.05,0.90,0.05 1|0:0.39:0.01,0
.99,0.01        1|0:0.89:0.20,0.70,0.10
1       31587   oar3_OAR1_28296 G       A       .       PASS    DR2=0.00;AF=0.0120;IMP  GT:DS:GP        1|1:0.02:0.05,0.05,0.90 1|1:0.02:0.01,0
.01,0.99        1|1:1.04:0.10,0.20,0.70

And I did a filter for by GT>85 with bcftools, and to the result I added the header again with a 'cat', getting something like this:

##fileformat=VCFv4.2
##filedate=20210504
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated ALT Allele Frequencies">
##INFO=<ID=DR2,Number=1,Type=Float,Description="Dosage R-Squared: estimated squared correlation between estimated REF dose [P(RA) + 2*P(RR)] and true REF dose">
##INFO=<ID=IMP,Number=0,Type=Flag,Description="Imputed marker">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  AA20098 AA20099 AC10002
        1       20509   oar3_OAR1_17218 A       G       .       PASS    DR2=0.16;AF=0.2293;IMP   GT    0|0     0|0     ./.
        1       21717   oar3_OAR1_18426 A       G       .       PASS    DR2=0.01;AF=0.0484;IMP   GT    0|1     0|1     ./.
        1       23949   oar3_OAR1_20658 A       C       .       PASS    DR2=0.17;AF=0.3023;IMP   GT    1|0     1|0     ./.
        1       31587   oar3_OAR1_28296 G       A       .       PASS    DR2=0;AF=0.012;IMP       GT    1|1     1|1     ./.

After that, I tried to convert it to plink format by vcftools and plink.

vcftools --vcf infile.vcf --out outfile --plink
plink --sheep --vcf infile.vcf --recode --out outfile 

But in booth I got this ped:

AA20098 AA20098 0 0 0 -9 0 0 0 0 0 0 0 0 0 0
AA20099 AA20099 0 0 0 -9 0 0 0 0 0 0 0 0 0 0
AC10002 AC10002 0 0 0 -9 0 0 0 0 0 0 0 0 0 0

And I don't understand why in this ped only appear 0 if the logical thing would be that here also appear 1. Any ideas?

vcf bcftools ped plink beagle • 243 views
ADD COMMENT
0
Entering edit mode

Can you post the EXACT contents of an example VCF that is not converting properly, along with the plink .log file, so I can replicate what you’re seeing?

ADD REPLY
0
Entering edit mode

It's already solved, it was silly due to when I edited the header and data I didn't notice that it was several spaces and not tabs like the rest of the vcf had. For that reason neither plink nor vcftools recognized the genotypes, now it works!. Thanks for your interest.

ADD REPLY

Login before adding your answer.

Traffic: 2084 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6