How to update the INFO field of VCF after merging, subsetting?
1
3
Entering edit mode
3.4 years ago
kirannbishwa01 ★ 1.3k

I am running into problem which has been asked in this forum but no right solution has be provided so far.

I have to regularly subset and merge several samples vcf during my data processing. But, I realize that a important information like AF - allele frequency, AC - allele count etc. aren't getting updated as need be.

I have used GATK and it is able to remove several sites that are: nonVariant, unused alternate, etc. etc. But, the important allele level information aren't getting update as required. I also tried VCF tools, BCFtools, vcf-merge but no help so far.

One, tool

https://github.com/opencb/hpg-variant

https://github.com/opencb/hpg-variant/wiki/VCF-Tools-tutorial

seems to be helpful, but there is problem with the installation because the of the required (old) package version.

Any suggestions ?

vcf merge genome INFO vcftools • 2.7k views
ADD COMMENT
0
Entering edit mode

AF and AC are changed only when you subset samples. Subsetting sites won't affect their values in any way. Can you show me examples of a command that are not changing AC and AF values? All the tools I've tried ([bv]cftools, SelectVariants, etc) do this very well for me.

ADD REPLY
0
Entering edit mode

Ram compare the given two VCF output at same site:

vcf-merge ms01e_phased.vcf.gz ms02g_phased.vcf.gz ms03g_phased.vcf.gz ms04h_phased.vcf.gz > F1.phased_variants.Final02.vcf

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  ms01e   ms02g   ms03g   ms04h
2   15881550    .   C   T   7920.03 PASS    AC=1;AF=0.125;AN=8;BaseQRankSum=0.264;ClippingRankSum=0.00;DP=716;ExcessHet=0.0575;FS=3.666;InbreedingCoeff=0.5844;MQ=60.00;MQRankSum=0.00;QD=34.25;ReadPosRankSum=1.15;SF=0,1,2,3;SOR=0.403    GT:AD:DP:GQ:PB:PC:PG:PGT:PI:PID:PL:PM:PW    0/0:49,0:49:99:.:.:0/0:.:.:.:0,102,1800:.:0/0   0/0:59,0:59:99:.:.:0/0:.:.:.:0,99,1485:.:0/0    0/0:38,0:38:99:.:.:0/0:.:.:.:0,102,1530:.:0/0   0/1:19,14:33:99:2-15881550-C-T,2-15881551-A-T:0.5:1|0:0|1:9:15881550_C_T:531,0,1272:0:|

The AF is 1(alt allele)/8 (total alleles) = 0.125, which is correct. This is because all the samples were first split and them merged.

But, I now add two extra samples and AF now should be 1/14 = 0.0714, but the AF is still at old value.

vcf-merge ms01e_phased.vcf.gz ms02g_phased.vcf.gz ms03g_phased.vcf.gz ms04h_phased.vcf.gz MA611_phased.vcf.gz MA605_phased.vcf.gz MA622_phased.vcf.gz > F1.phased_variants.Final02.vcf

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  ms01e   ms02g   ms03g   ms04h   MA611   MA605   MA622
2   15881550    .   C   T   7920.03 PASS    AC=1;AF=0.125;AN=14;BaseQRankSum=0.264;ClippingRankSum=0.00;DP=2282;ExcessHet=0.0575;FS=3.666;InbreedingCoeff=0.5844;MLEAC=11;MLEAF=0.344;MQ=60.00;MQRankSum=0.00;QD=34.25;ReadPosRankSum=1.15;SF=0,1,2,3,4,5,6;SOR=0.403;set=HignConfSNPs  GT:PB:PGT:PM:PID:PG:GQ:DP:PW:PI:AD:PL:PC    0/0:.:.:.:.:0/0:99:49:0/0:.:49,0:0,102,1800:.   0/0:.:.:.:.:0/0:99:59:0/0:.:59,0:0,99,1485:.    0/0:.:.:.:.:0/0:99:38:0/0:.:38,0:0,102,1530:.   0/1:2-15881550-C-T,2-15881551-A-T:0|1:0:15881550_C_T:1|0:99:33:|:9:19,14:531,0,1272:0.5 0/0:.:.:.:.:0/0:60:21:0/0:.:21,0:0,60,900:. 0/0:.:.:.:.:0/0:39:16:0/0:.:16,0:0,39,585:. 0/0:.:.:.:.:0/0:99:39:0/0:.:39,0:0,102,1575:.

So, I am guessing that AF might be changing when subsetting, but not while merging.

ADD REPLY
0
Entering edit mode

Can you paste the individual records for all 7 samples please? I'm curious about what's going on here.

ADD REPLY
1
Entering edit mode
3.4 years ago
jd10.work ▴ 10

Hi, try the vcffixup script included in the vcflib package. It updates AC, AF, and AN.

vcffixup my.vcf > updated.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 1622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6