Entering edit mode
                    4.5 years ago
        Alejandro
        
    
        ▴
    
    10
    I had a vcf file with imputed genotypes from beagle like this:
##fileformat=VCFv4.2
##filedate=20210504
##source="beagle.18May20.d20.jar"
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated ALT Allele Frequencies">
##INFO=<ID=DR2,Number=1,Type=Float,Description="Dosage R-Squared: estimated squared correlation between estimated REF dose [P(RA) + 2*P(RR)] and true REF dose">
##INFO=<ID=IMP,Number=0,Type=Flag,Description="Imputed marker">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=A,Type=Float,Description="estimated ALT dose [P(RA) + 2*P(AA)]">
##FORMAT=<ID=GP,Number=G,Type=Float,Description="Estimated Genotype Probability">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  AA20098 AA20099 AC10002
1       20509   oar3_OAR1_17218 A       G       .       PASS    DR2=0.16;AF=0.2293;IMP  GT:DS:GP        0|0:0.28:0.90,0.05,0.05 0|0:0.27:0.99,0
.01,0.01        0|0:0.71:0.70,0.20,0.10
1       21717   oar3_OAR1_18426 A       G       .       PASS    DR2=0.01;AF=0.0484;IMP  GT:DS:GP        0|1:0.12:0.05,0.90,0.05 0|1:0.11:0.01,0
.99,0.01        0|1:0.06:0.20,0.70,0.10
1       23949   oar3_OAR1_20658 A       C       .       PASS    DR2=0.17;AF=0.3023;IMP  GT:DS:GP        1|0:0.41:0.05,0.90,0.05 1|0:0.39:0.01,0
.99,0.01        1|0:0.89:0.20,0.70,0.10
1       31587   oar3_OAR1_28296 G       A       .       PASS    DR2=0.00;AF=0.0120;IMP  GT:DS:GP        1|1:0.02:0.05,0.05,0.90 1|1:0.02:0.01,0
.01,0.99        1|1:1.04:0.10,0.20,0.70
And I did a filter for by GT>85 with bcftools, and to the result I added the header again with a 'cat', getting something like this:
##fileformat=VCFv4.2
##filedate=20210504
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated ALT Allele Frequencies">
##INFO=<ID=DR2,Number=1,Type=Float,Description="Dosage R-Squared: estimated squared correlation between estimated REF dose [P(RA) + 2*P(RR)] and true REF dose">
##INFO=<ID=IMP,Number=0,Type=Flag,Description="Imputed marker">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  AA20098 AA20099 AC10002
        1       20509   oar3_OAR1_17218 A       G       .       PASS    DR2=0.16;AF=0.2293;IMP   GT    0|0     0|0     ./.
        1       21717   oar3_OAR1_18426 A       G       .       PASS    DR2=0.01;AF=0.0484;IMP   GT    0|1     0|1     ./.
        1       23949   oar3_OAR1_20658 A       C       .       PASS    DR2=0.17;AF=0.3023;IMP   GT    1|0     1|0     ./.
        1       31587   oar3_OAR1_28296 G       A       .       PASS    DR2=0;AF=0.012;IMP       GT    1|1     1|1     ./.
After that, I tried to convert it to plink format by vcftools and plink.
vcftools --vcf infile.vcf --out outfile --plink
plink --sheep --vcf infile.vcf --recode --out outfile 
But in booth I got this ped:
AA20098 AA20098 0 0 0 -9 0 0 0 0 0 0 0 0 0 0
AA20099 AA20099 0 0 0 -9 0 0 0 0 0 0 0 0 0 0
AC10002 AC10002 0 0 0 -9 0 0 0 0 0 0 0 0 0 0
And I don't understand why in this ped only appear 0 if the logical thing would be that here also appear 1. Any ideas?
Can you post the EXACT contents of an example VCF that is not converting properly, along with the plink .log file, so I can replicate what you’re seeing?
It's already solved, it was silly due to when I edited the header and data I didn't notice that it was several spaces and not tabs like the rest of the vcf had. For that reason neither plink nor vcftools recognized the genotypes, now it works!. Thanks for your interest.
Hello. Could you please share the code you used to filter by a GP>0.85? I am trying to do that with bcftools -selGT but I am having problems. Thank you