Question: GATK calls two overlapping homozyogous indels in consecutive bases in a single sample
gravatar for Fabio Marroni
2.9 years ago by
Fabio Marroni2.3k
Fabio Marroni2.3k wrote:

I noticed that GATK sometimes calls two consecutive indels like the two below. One, at position 3479486, is a variation from AAG to A. The second, at 3479487, is a variation from AG to A. Both indels survived a quite strict quality filtering, are both homozygous and both supported by 54 reads. You can see the two lines below.

chr13   3479486 .       AAG     A       1640.73 PASS    AC=2;AF=1.00;AN=2;DP=56;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=53.87;MQ0=0;QD=29.30;SOR=0.767 GT:AD:DP:GQ:PL  1/1:0,54:56:99:1678,151,0

chr13   3479487 .       AG      A       1448.73 PASS    AC=2;AF=1.00;AN=2;DP=56;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=53.87;MQ0=0;QD=25.87;SOR=0.767 GT:AD:DP:GQ:PL  1/1:0,54:56:99:1486,160,0

My reference in the region is as follows


This convinced me that GATK somehow got confused, and is calling two different variants for the same event. Realignment near indels has already been performed.

For downstream analysis I want to find a general way of dealing with such issue by removing one of the two.

Are you aware of any solution for this?

EDIT Sept 2nd*****

I found that the solution provided in a Biostars post might work for me (so maybe my question is duplicate?)

bcftools filter --IndelGap 3 infile.vcf > outfile.vcf

I will stick to it, but too minor improvements would be great! 1) I would like to remove indels that overlap, irrespective of the distance 2) I would like to select which indel to remove based on some quality information (looks like bcftools always removes the second instance)

variants indels gatk • 1.0k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Fabio Marroni2.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1465 users visited in the last hour