Question: bcftools merge heterozygous calls
0
gravatar for evelyn
12 weeks ago by
evelyn60
evelyn60 wrote:

Hello,

I have merged three vcf files and translated the SNP calls using bcftools query and awk. I want to change the heterozygous calls as IUPAC codes:

Code Base
R   A or G
Y   C or T
S   G or C
W   A or T
K   G or T
M   A or C
N   any base

I tried using:

bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%TGT]\n' merge.vcf | awk -v FS="\t" -v OFS="\t" '{for(i=6;i<=NF;i++) {split($i, gt, "/"); if(gt[1]==".") $i="NA"; else if(gt[1]==gt[2]) $i=gt[1]; else if(gt[2]=="A" || gt[2]=="G") $i="R";  else if(gt[2]=="C" || gt[2]=="T") $i="Y"; else if(gt[2]=="G" || gt[2]=="C") $i="S"; else if(gt[2]=="A" || gt[2]=="T") $i="W"; else if(gt[2]=="G" || gt[2]=="T") $i="K"; else if(gt[2]=="A" || gt[2]=="C") $i="M"; else $i="N";} print }' > out.vcf

But it gives wrong IUPAC codes:

CHROM POS ID REF ALT File1 File2 File3
1   441 .   G   T   T   Y   NA
1   447 .   G   A   A   R   NA
1   452 .   G   A   A   A   NA
1   468 .   G   T   T   Y   NA
1   473 .   T   C   C   C   NA
1   474 .   G   T   T   Y   NA
1   489 .   T   A   A   A   NA
1   555 .   A   G   G   G   NA
1   577 .   T   G   G   G   NA
1   578 .   A   T   T   T   NA
1   909 .   C   A   A   NA  NA

But it gives wrong IUPAC codes for example, for G or T (pos 468 for File 2), it should give a K but gives Y. I think I am missing something in the command here. Thank you for your help!

snp • 185 views
ADD COMMENTlink written 12 weeks ago by evelyn60
1
gravatar for finswimmer
11 weeks ago by
finswimmer12k
Germany
finswimmer12k wrote:

Hello evelyn ,

why have you deleted your original post (I have reopened it) and moved your comment to a new question? It is absolutely fine to stay in the old thread so others can see the context of your question.

if(gt[2]=="A" || gt[2]=="G") $i="R";

That's not correct. It is R for the genotypes AG or GA and not if the second value is A or G.

if((gt[1]=="A" && gt[2]=="G") || (gt[1]=="G" && gt[2]=="A")) $i="R";

Same for the other combinations.

Or instead of comparing the splitted values use the complete field value:

if($i == "A/G" || $i == "G/A") $i="R";

or even shorter using regex:

if($i ~ /(A\/G) || (G\/A)/) $i="R";

fin swimmer

ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by finswimmer12k

Thank you, @fin swimmer! It worked perfectly. I accidentally deleted my post and I did not know how to recover it :/ Thank you!

ADD REPLYlink written 11 weeks ago by evelyn60

The above command converted A\G or G\A to R as:

bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%TGT]\n' merge.vcf | awk -v FS="\t" -v OFS="\t" '{for(i=6;i<=NF;i++) {split($i, gt, "/"); if(gt[1]==".") $i="NA"; else if (gt[1]==gt[2]) $i=gt[1]; else if($i ~ /(A\/G) || (G\/A)/) $i="R";  else if($i ~ /(C\/T) || (T\/C)/) $i="Y"; else if($i ~ /(G\/C) || (C\/G)/) $i="S"; else if($i ~ /(A\/T) || (T\/A)/) $i="W"; else if($i ~ /(G\/T) || (T\/G)/) $i="K"; else if($i ~ /(A\/C) || (C\/A)/) $i="M"; else $i="N";} print }' > out.vcf

But it converts all the heterozygous calls to R only. There is no error and the file is produced but only to R. I am not sure what is missing.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by evelyn60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1267 users visited in the last hour