Question: bcftools merge of multiple vcf produces duplicate records. How to solve the issue.
2
gravatar for kirannbishwa01
9 months ago by
United States
kirannbishwa01890 wrote:

I have multiple single-sample VCF files, which I want to merge into a single multi-sample VCF file. When using bcftools merge I am getting duplicate records.

$ bcftools merge ms01e_phased.vcf.gz ms02g_phased.vcf.gz ms03g_phased.vcf.gz ms04h_phased.vcf.gz MA605_phased.vcf.gz MA611_phased.vcf.gz -O v -o RBphased_variants.SixSamples.Final.vcf
 # duplicate records at the same lines from the file "RBphased_variants.SixSamples.Final.vcf"
2   14691373    .   A   .   1153.31 PASS    BaseQRankSum=2.02;ClippingRankSum=0;ExcessHet=3.0103;FS=1.098;InbreedingCoeff=-0.0861;MQ=58.74;MQRankSum=-2.459;QD=19.22;ReadPosRankSum=-0.466;SOR=0.96;DP=68;AN=8  GT:AD:DP:GQ:PL:PG:PB:PI:PW:PC:PM    0/0:4:4:0:0:0/0:.:.:0/0:.:. 0/0:7:7:18:0:0/0:.:.:0/0:.:.    0/0:4:4:12:0:0/0:.:.:0/0:.:.    0/0:2:2:3:0:0/0:.:.:0/0:.:. ./.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.
2   14691373    .   A   AAG 1153.31 PASS    BaseQRankSum=2.02;ClippingRankSum=0;ExcessHet=3.0103;FS=1.098;InbreedingCoeff=-0.0861;MQ=58.74;MQRankSum=-2.459;QD=19.22;ReadPosRankSum=-0.466;SOR=0.96;set=InDels;DP=676;AF=0.042;AN=4;AC=0    GT:AD:DP:GQ:PGT:PID:PL:PG:PB:PI:PW:PC:PM    ./.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:12,0:12:9:.:.:0,9,135:0/0:.:.:0/0:.:.   0/0:22,0:22:12:.:.:0,12,180:0/0:.:.:0/0:.:.
2   14691374    .   A   .   1320.25 PASS    BaseQRankSum=-1.049;ClippingRankSum=0;ExcessHet=0.2929;FS=0;InbreedingCoeff=0.4006;MQ=55.35;MQRankSum=0;QD=33.01;ReadPosRankSum=-0.671;SOR=0.892;DP=44;AN=2 GT:AD:DP:GQ:PL:PG:PB:PI:PW:PC:PM    0/0:4:4:0:0:0/0:.:.:0/0:.:. ./.:7:7:.:0:./.:.:.:./.:.:. ./.:0:0:.:0:./.:.:.:./.:.:. ./.:0:0:.:0:./.:.:.:./.:.:. ./.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.
2   14691374    .   A   G   1320.25 PASS    BaseQRankSum=-1.049;ClippingRankSum=0;ExcessHet=0.2929;FS=0;InbreedingCoeff=0.4006;MQ=55.35;MQRankSum=0;QD=33.01;ReadPosRankSum=-0.671;SOR=0.892;set=HignConfSNPs;DP=710;AF=0.115;MLEAC=3;MLEAF=0.115;AN=4;AC=0 GT:AD:DP:GQ:PGT:PID:PL:PG:PB:PI:PW:PC:PM    ./.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:12,0:12:9:.:.:0,9,135:0/0:.:.:0/0:.:.   0/0:22,0:22:12:.:.:0,12,180:0/0:.:.:0/0:.:.

I raised this issue in bcftools thinking if it was a bug https://github.com/samtools/bcftools/issues/754 . But, is there any other solution to the problem.

ADD COMMENTlink modified 9 months ago by Kevin Blighe33k • written 9 months ago by kirannbishwa01890
4
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe33k
Republic of Ireland
Kevin Blighe33k wrote:

Hey kirannbishwa01,

I would not call these duplicate records as they are different calls but at the same position.

Just adding --merge all to your command should help to solve it

So:

bcftools merge --merge all ms01e_phased.vcf.gz ms02g_phased.vcf.gz ms03g_phased.vcf.gz ms04h_phased.vcf.gz MA605_phased.vcf.gz MA611_phased.vcf.gz -O v > RBphased_variants.SixSamples.Final.vcf

Kevin

ADD COMMENTlink modified 20 days ago • written 9 months ago by Kevin Blighe33k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1762 users visited in the last hour