drop duplicate insertion deletions in VCF at same position while keeping one
1
0
Entering edit mode
14 months ago
curious ▴ 750

I am normalizing some GWAS summary statistics to gnomad.

gnomad has some entries like this that seem to be duplicated indels:

chr21   13405435        rs140129927     G       GT      .       PASS    AC=2962;AN=148224;AF=0.0199833;popmax=afr;faf95_popmax=0.0636127;AC_non_v2_XX=1118;AN_non_v2_XX=59420>
chr21   13405435        rs140129927     GT      G       .       PASS    AC=40946;AN=148190;AF=0.276307;popmax=amr;faf95_popmax=0.419202;AC_non_v2_XX=16812;AN_non_v2_XX=59400

I realize these might be two different measurements, but for my purposes I really only need one (having both is messing up my pipeline)

How can I drop duplicate indels (keeping one) at the same position and with the same REF/ALT alleles ? I want to keep multiallelic SNVs untouched, issue just seems to be the indels

will bcftools norm --rm-dup indels do this? Is there anything I am missing?

bcftools • 764 views
ADD COMMENT
0
Entering edit mode

followup question: how can those be the same variant with allele frequencies like that? it seem like an insertion of T and deletion of T with G as the anchor would have mirrored frequencies, not one being 0.0199833 and the other being 0.276307

ADD REPLY
2
Entering edit mode
14 months ago

Unfortunately, these records are not "different measurements", they're different types of events, the first is an insertion and the second is a deletion.

As a practical matter, if your pipeline can't handle this, you have at least hundreds of thousands of other variants to work with so you can afford to just arbitrarily drop one or both of these. But it clearly is a bug in your pipeline.

ADD COMMENT

Login before adding your answer.

Traffic: 1681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6