How to update the INFO field of VCF after merging, subsetting?
3.4 years ago
kirannbishwa01 ★ 1.3k

I am running into problem which has been asked in this forum but no right solution has be provided so far.

I have to regularly subset and merge several samples vcf during my data processing. But, I realize that a important information like AF - allele frequency, AC - allele count etc. aren't getting updated as need be.

I have used GATK and it is able to remove several sites that are: nonVariant, unused alternate, etc. etc. But, the important allele level information aren't getting update as required. I also tried VCF tools, BCFtools, vcf-merge but no help so far.

One, tool

seems to be helpful, but there is problem with the installation because the of the required (old) package version.

Any suggestions ?

AF and AC are changed only when you subset samples. Subsetting sites won't affect their values in any way. Can you show me examples of a command that are not changing AC and AF values? All the tools I've tried ([bv]cftools, SelectVariants, etc) do this very well for me.

Ram compare the given two VCF output at same site:

vcf-merge ms01e_phased.vcf.gz ms02g_phased.vcf.gz ms03g_phased.vcf.gz ms04h_phased.vcf.gz > F1.phased_variants.Final02.vcf

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  ms01e   ms02g   ms03g   ms04h
2   15881550    .   C   T   7920.03 PASS    AC=1;AF=0.125;AN=8;BaseQRankSum=0.264;ClippingRankSum=0.00;DP=716;ExcessHet=0.0575;FS=3.666;InbreedingCoeff=0.5844;MQ=60.00;MQRankSum=0.00;QD=34.25;ReadPosRankSum=1.15;SF=0,1,2,3;SOR=0.403    GT:AD:DP:GQ:PB:PC:PG:PGT:PI:PID:PL:PM:PW    0/0:49,0:49:99:.:.:0/0:.:.:.:0,102,1800:.:0/0   0/0:59,0:59:99:.:.:0/0:.:.:.:0,99,1485:.:0/0    0/0:38,0:38:99:.:.:0/0:.:.:.:0,102,1530:.:0/0   0/1:19,14:33:99:2-15881550-C-T,2-15881551-A-T:0.5:1|0:0|1:9:15881550_C_T:531,0,1272:0:|

The AF is 1(alt allele)/8 (total alleles) = 0.125, which is correct. This is because all the samples were first split and them merged.

But, I now add two extra samples and AF now should be 1/14 = 0.0714, but the AF is still at old value.

vcf-merge ms01e_phased.vcf.gz ms02g_phased.vcf.gz ms03g_phased.vcf.gz ms04h_phased.vcf.gz MA611_phased.vcf.gz MA605_phased.vcf.gz MA622_phased.vcf.gz > F1.phased_variants.Final02.vcf

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  ms01e   ms02g   ms03g   ms04h   MA611   MA605   MA622
2   15881550    .   C   T   7920.03 PASS    AC=1;AF=0.125;AN=14;BaseQRankSum=0.264;ClippingRankSum=0.00;DP=2282;ExcessHet=0.0575;FS=3.666;InbreedingCoeff=0.5844;MLEAC=11;MLEAF=0.344;MQ=60.00;MQRankSum=0.00;QD=34.25;ReadPosRankSum=1.15;SF=0,1,2,3,4,5,6;SOR=0.403;set=HignConfSNPs  GT:PB:PGT:PM:PID:PG:GQ:DP:PW:PI:AD:PL:PC    0/0:.:.:.:.:0/0:99:49:0/0:.:49,0:0,102,1800:.   0/0:.:.:.:.:0/0:99:59:0/0:.:59,0:0,99,1485:.    0/0:.:.:.:.:0/0:99:38:0/0:.:38,0:0,102,1530:.   0/1:2-15881550-C-T,2-15881551-A-T:0|1:0:15881550_C_T:1|0:99:33:|:9:19,14:531,0,1272:0.5 0/0:.:.:.:.:0/0:60:21:0/0:.:21,0:0,60,900:. 0/0:.:.:.:.:0/0:39:16:0/0:.:16,0:0,39,585:. 0/0:.:.:.:.:0/0:99:39:0/0:.:39,0:0,102,1575:.

So, I am guessing that AF might be changing when subsetting, but not while merging.

Can you paste the individual records for all 7 samples please? I'm curious about what's going on here.

3.4 years ago ▴ 10

Hi, try the vcffixup script included in the vcflib package. It updates AC, AF, and AN.

vcffixup my.vcf > updated.vcf

