Better splitting of mutliallelic sites then bcftools norm --m-any
1
1
Entering edit mode
19 months ago
gernophil ▴ 80

Hey everyone,

I am trying to split the multiallelic sites of my VCF. I used bcftools norm --m-any. However, the result is not really reasonable to me. Here's an example.

Let's say, I have this multiallelic site:

REF     ALT     GT1     GT2     GT3
A       C,G     1/2     0/2     0/1

After splitting I get these two:

REF     ALT     GT1     GT2     GT3
A       C       1/0     0/0     0/1
A       G       0/1     0/1     0/0

So, the results for the "unused" ALT allele for a specific row is just set to REF. Is there a way to change this behavior, since I don't think it's reasonable to do it this way, at least for my analysis. I would like my result to be more like this:

REF     ALT     GT1     GT2     GT3          GT1     GT2     GT3
A       C       1/.     0/.     0/1    or    ./.     ./.     0/1
A       G       ./1     0/1     0/.          ./.     0/1     ./.

Or similar. At least I don't want to have REF where there was an ALT before.

bcftools • 797 views
ADD COMMENT
2
Entering edit mode
19 months ago

There isn’t a standard way to handle half-missing genotype calls. Often, conversion of half-missing to fully-missing genotypes is the best that can be done.

As a consequence, your suggested split isn’t really information-preserving. The bcftools split operation is analytically problematic, but at least it’s reversible.

ADD COMMENT

Login before adding your answer.

Traffic: 1778 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6