Both bcftools norm and vt normalization failing for the same variant?
1
0
Entering edit mode
4 months ago
jpuntomarcos ▴ 40

Hi,

I want to left-normalize (5') all genomic variants in my pipeline. But something occurred for the 1:17371287 GAGGT/- variant. If I use this VCF as input:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1   17371286    1:17371287_GAGGT/-  TGAGGT  T   .   PASS

The output for both vt normalization and bcftools norm is

1   17371285    1:17371287_GAGGT/-  ATGAGG  A

That is, the variant has been moved 1 pos to the left. However, if we check reference, we see there is no repeat pattern to justify that shift: genomic region

It seems that the input VCF, TGAGGT / T, is ambiguous and makes both normalizers consider that the deletion is from the first T to the G (TGAGG) instead of from the G to the last T (GAGGT). Well, I tried to use a more exhaustive variant description as VCF input:

1   17371283    1:17371287_GAGGT/-  ATATGAGGTTTGTCT ATATTTGTCT

However, the result is the same, the variant is again moved to the left:

1   17371285    1:17371287_GAGGT/-  ATGAGG  A

Am I missing something? Any help would be very welcomed :)

Note: Websites refer to rs786202100 indel with both coordinates: 1:17371286-17371290 and 1:17371287-17371291 (ex1, ex2), which makes all a bit more confusing.

VCF bcftools indels normalization • 243 views
ADD COMMENT
1
Entering edit mode
4 months ago

For me the behavior looks correct.

Let's check it manual.

This is Sequence we have:

CATATGAGGTTTGTC

The vcf variant description like to do this:

CATA TGAGGT TTGTC --> CATA T TTGTC

bcftools norm reports this:

CAT ATGAGG TTTGTC --> CAT A TTTGTC

So the result is the same, but the last one starts more towards 5'.

ADD COMMENT

Login before adding your answer.

Traffic: 2141 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6