What is the difference between norm --multiallelics -any versus --atomize?
Entering edit mode
10 months ago
a615ebfb ▴ 40

Hello, forgive my ignorance-

Suppose input.vcf contains a complex multiallelic site.

What is the difference between

bcftools norm --multiallelics -any -f hg38.fa input.vcf


bcftools norm --atomize -f hg38.fa input.vcf

I understand what --multiallelics -any does but not sure what is going on with --atomize. In the documentation it says "Decompose complex variants, e.g. split MNVs into consecutive SNVs.". I do not understand what this means for a multiallelic site.

If someone has a good example that would help clarify, that would be great.

Thanks in advance.

bcftools • 691 views
Entering edit mode
9 months ago
Ram 44k

I don't think atomization compares to norm with respect to multiallelic sites. You can see an example of atomatization on a multi-allelic site in the example under --atom-overlaps option documentation:

# Before atomization:
    100  CC  C,GG   1/2

    # After:
    #   bcftools norm -a .
    100  C   G      ./1
    100  CC  C      1/.
    101  C   G      ./1

Normalization would just give you 2 records (I can't tell offhand what the GT field would be):

100 CC GG
100 CC C

Only the ALT field is split and the REF/POS are altered only in certain cases. MNVs are not split into SNVs - CC>GG remains CC>GG. I think when atomize is used MNVs will be split, so you get 2 C>G entries instead of one CC>GG entry. Note that this split would happen even if that record were not multiallelic.

Side note: I wonder if they meant bcftools norm -a --atom-overlaps . and not bcftools norm -a ., but that's not today's problem.


Login before adding your answer.

Traffic: 1857 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6