bcftools merge heterozygous calls
1
0
Entering edit mode
4.6 years ago
evelyn ▴ 230

Hello,

I have merged three vcf files and translated the SNP calls using bcftools query and awk. I want to change the heterozygous calls as IUPAC codes:

Code Base
R   A or G
Y   C or T
S   G or C
W   A or T
K   G or T
M   A or C
N   any base

I tried using:

bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%TGT]\n' merge.vcf | awk -v FS="\t" -v OFS="\t" '{for(i=6;i<=NF;i++) {split($i, gt, "/"); if(gt[1]==".") $i="NA"; else if(gt[1]==gt[2]) $i=gt[1]; else if(gt[2]=="A" || gt[2]=="G") $i="R";  else if(gt[2]=="C" || gt[2]=="T") $i="Y"; else if(gt[2]=="G" || gt[2]=="C") $i="S"; else if(gt[2]=="A" || gt[2]=="T") $i="W"; else if(gt[2]=="G" || gt[2]=="T") $i="K"; else if(gt[2]=="A" || gt[2]=="C") $i="M"; else $i="N";} print }' > out.vcf

But it gives wrong IUPAC codes:

CHROM POS ID REF ALT File1 File2 File3
1   441 .   G   T   T   Y   NA
1   447 .   G   A   A   R   NA
1   452 .   G   A   A   A   NA
1   468 .   G   T   T   Y   NA
1   473 .   T   C   C   C   NA
1   474 .   G   T   T   Y   NA
1   489 .   T   A   A   A   NA
1   555 .   A   G   G   G   NA
1   577 .   T   G   G   G   NA
1   578 .   A   T   T   T   NA
1   909 .   C   A   A   NA  NA

But it gives wrong IUPAC codes for example, for G or T (pos 468 for File 2), it should give a K but gives Y. I think I am missing something in the command here. Thank you for your help!

snp • 890 views
ADD COMMENT
1
Entering edit mode
4.6 years ago

Hello evelyn ,

why have you deleted your original post (I have reopened it) and moved your comment to a new question? It is absolutely fine to stay in the old thread so others can see the context of your question.

if(gt[2]=="A" || gt[2]=="G") $i="R";

That's not correct. It is R for the genotypes AG or GA and not if the second value is A or G.

if((gt[1]=="A" && gt[2]=="G") || (gt[1]=="G" && gt[2]=="A")) $i="R";

Same for the other combinations.

Or instead of comparing the splitted values use the complete field value:

if($i == "A/G" || $i == "G/A") $i="R";

or even shorter using regex:

if($i ~ /(A\/G) || (G\/A)/) $i="R";

fin swimmer

ADD COMMENT
0
Entering edit mode

Thank you, @fin swimmer! It worked perfectly. I accidentally deleted my post and I did not know how to recover it :/ Thank you!

ADD REPLY
0
Entering edit mode

The above command converted A\G or G\A to R as:

bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%TGT]\n' merge.vcf | awk -v FS="\t" -v OFS="\t" '{for(i=6;i<=NF;i++) {split($i, gt, "/"); if(gt[1]==".") $i="NA"; else if (gt[1]==gt[2]) $i=gt[1]; else if($i ~ /(A\/G) || (G\/A)/) $i="R";  else if($i ~ /(C\/T) || (T\/C)/) $i="Y"; else if($i ~ /(G\/C) || (C\/G)/) $i="S"; else if($i ~ /(A\/T) || (T\/A)/) $i="W"; else if($i ~ /(G\/T) || (T\/G)/) $i="K"; else if($i ~ /(A\/C) || (C\/A)/) $i="M"; else $i="N";} print }' > out.vcf

But it converts all the heterozygous calls to R only. There is no error and the file is produced but only to R. I am not sure what is missing.

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6