bcftools merge heterozygous calls
0
0
Entering edit mode
4.7 years ago
evelyn ▴ 230

Hello,

I have merged three vcf files and translated the SNP calls using bcftools query and awk

$ bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%TGT]\n' merge.vcf | awk -v FS="\t" -v OFS="\t" '{for(i=6;i<=NF;i++) {split($i, gt, "/"); if(gt[1]==".") $i="-"; else if(gt[1]==gt[2]) $i=gt[1]; else $i="N";} print }'
1   687 .   G   A   -   A   -
1   689 .   G   A   -   A   -
1   701 .   T   A   -   A   -
1   704 .   T   G   -   G   -
1   708 .   C   T,A T   -   A
1   6440    .   G   T   T   N   -

Here the heterozygous calls are called as N. I want to change the following heterozygous calls as IUPAC codes:

Code Base
R   A or G
Y   C or T
S   G or C
W   A or T
K   G or T
M   A or C
N   any base

I tried using:

bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%TGT]\n' merge.vcf | awk -v FS="\t" -v OFS="\t" '{for(i=6;i<=NF;i++) {split($i, gt, "/"); if(gt[1]==".") $i="NA"; else if(gt[1]==gt[2]) $i=gt[1]; else if(gt[2]=="AG") $i="R";  else if(gt[2]=="CT") $i="Y"; else if(gt[2]=="GC") $i="S"; else if(gt[2]=="AT") $i="W"; else if(gt[2]=="GT") $i="K"; else if(gt[2]=="AC") $i="M"; else $i="N";} print }' > out.vcf

But it does not give IUPAC codes for the above heterozygous calls. Any suggestions/help is appreciated.

Thank you so much for your time!

snp bcftools • 832 views
ADD COMMENT
0
Entering edit mode
else if(gt[2]=="AG")

You are checking if the second value of the genotype after splitting by / is AG. This is not what you want. You would like to check if gt[1] is A and gt[2] is G or vice versa.

ADD REPLY

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6