Question: GATK calls two overlapping homozyogous indels in consecutive bases in a single sample
gravatar for Fabio Marroni
9 months ago by
Fabio Marroni1.2k
Fabio Marroni1.2k wrote:

I noticed that GATK sometimes calls two consecutive indels like the two below. One, at position 3479486, is a variation from AAG to A. The second, at 3479487, is a variation from AG to A. Both indels survived a quite strict quality filtering, are both homozygous and both supported by 54 reads. You can see the two lines below.

chr13   3479486 .       AAG     A       1640.73 PASS    AC=2;AF=1.00;AN=2;DP=56;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=53.87;MQ0=0;QD=29.30;SOR=0.767 GT:AD:DP:GQ:PL  1/1:0,54:56:99:1678,151,0

chr13   3479487 .       AG      A       1448.73 PASS    AC=2;AF=1.00;AN=2;DP=56;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=53.87;MQ0=0;QD=25.87;SOR=0.767 GT:AD:DP:GQ:PL  1/1:0,54:56:99:1486,160,0

My reference in the region is as follows


This convinced me that GATK somehow got confused, and is calling two different variants for the same event. Realignment near indels has already been performed.

For downstream analysis I want to find a general way of dealing with such issue by removing one of the two.

Are you aware of any solution for this?

EDIT Sept 2nd*****

I found that the solution provided in a Biostars post might work for me (so maybe my question is duplicate?)

bcftools filter --IndelGap 3 infile.vcf > outfile.vcf

I will stick to it, but too minor improvements would be great! 1) I would like to remove indels that overlap, irrespective of the distance 2) I would like to select which indel to remove based on some quality information (looks like bcftools always removes the second instance)

variants indels gatk • 383 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by Fabio Marroni1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 668 users visited in the last hour