drop duplicate insertion deletions in VCF at same position while keeping one
1
0
Entering edit mode
8 weeks ago
curious ▴ 720

I am normalizing some GWAS summary statistics to gnomad.

gnomad has some entries like this that seem to be duplicated indels:

chr21   13405435        rs140129927     G       GT      .       PASS    AC=2962;AN=148224;AF=0.0199833;popmax=afr;faf95_popmax=0.0636127;AC_non_v2_XX=1118;AN_non_v2_XX=59420>
chr21   13405435        rs140129927     GT      G       .       PASS    AC=40946;AN=148190;AF=0.276307;popmax=amr;faf95_popmax=0.419202;AC_non_v2_XX=16812;AN_non_v2_XX=59400


I realize these might be two different measurements, but for my purposes I really only need one (having both is messing up my pipeline)

How can I drop duplicate indels (keeping one) at the same position and with the same REF/ALT alleles ? I want to keep multiallelic SNVs untouched, issue just seems to be the indels

will bcftools norm --rm-dup indels do this? Is there anything I am missing?

bcftools • 393 views
0
Entering edit mode

followup question: how can those be the same variant with allele frequencies like that? it seem like an insertion of T and deletion of T with G as the anchor would have mirrored frequencies, not one being 0.0199833 and the other being 0.276307

2
Entering edit mode
8 weeks ago

Unfortunately, these records are not "different measurements", they're different types of events, the first is an insertion and the second is a deletion.

As a practical matter, if your pipeline can't handle this, you have at least hundreds of thousands of other variants to work with so you can afford to just arbitrarily drop one or both of these. But it clearly is a bug in your pipeline.