GATK calls two overlapping homozyogous indels in consecutive bases in a single sample
0
1
Entering edit mode
7.6 years ago
Fabio Marroni ★ 3.0k

I noticed that GATK sometimes calls two consecutive indels like the two below. One, at position 3479486, is a variation from AAG to A. The second, at 3479487, is a variation from AG to A. Both indels survived a quite strict quality filtering, are both homozygous and both supported by 54 reads. You can see the two lines below.

chr13   3479486 .       AAG     A       1640.73 PASS    AC=2;AF=1.00;AN=2;DP=56;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=53.87;MQ0=0;QD=29.30;SOR=0.767 GT:AD:DP:GQ:PL  1/1:0,54:56:99:1678,151,0

chr13   3479487 .       AG      A       1448.73 PASS    AC=2;AF=1.00;AN=2;DP=56;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=53.87;MQ0=0;QD=25.87;SOR=0.767 GT:AD:DP:GQ:PL  1/1:0,54:56:99:1486,160,0

My reference in the region is as follows

>chr13:3479484-3479488
AAAAG

This convinced me that GATK somehow got confused, and is calling two different variants for the same event. Realignment near indels has already been performed.

For downstream analysis I want to find a general way of dealing with such issue by removing one of the two.

Are you aware of any solution for this?

EDIT Sept 2nd*****

I found that the solution provided in a Biostars post might work for me (so maybe my question is duplicate?)

bcftools filter --IndelGap 3 infile.vcf > outfile.vcf

I will stick to it, but too minor improvements would be great! 1) I would like to remove indels that overlap, irrespective of the distance 2) I would like to select which indel to remove based on some quality information (looks like bcftools always removes the second instance)

GATK indels variants • 1.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6