bcftools multiallelic split not working
1
0
Entering edit mode
7 weeks ago

I am attempting to split multiallelic sites using bcftools norm with the following command:

zcat ${inputVcf} | \
sed 's/AD,Number=./AD,Number=R/g' | \
sed 's/ADR,Number=./ADR,Number=R/g' | \
sed 's/ADF,Number=./ADF,Number=R/g' | \
bcftools norm \
  --fasta-ref ${genomeFa} \
  --check-ref s \
  --multiallelics -any \
  --output ${outputVcf}

The sed commands were based on the recommendation from here. However I'm still getting FORMAT entries such as the following: GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL 1/0:44:44:56:1,10,5:1,4,2:0,6,3:PASS:511,99,48 ./.:.:.:.:.:.:.:.:. 0/1:53:53:63:0,12,6:0,4,1:0,8,5:PASS:483,210,164 which are clearly multiallelic. Anybody know how to fix this?

bcftools vcf • 204 views
ADD COMMENT
1
Entering edit mode
7 weeks ago

Hi, I think that you misinterpret what is a 'multi-allelic' call. The entry that you posted is not multi-allelic in this sense. A multi-allelic call may look like:

A      G,T    1/2

Thus, the genotype is GT. After splitting, this would become:

A      G      0/1
A      T      0/1

Kevin

ADD COMMENT
1
Entering edit mode

i think that clarifies things and pointed me in the right direction. what happened was, the vcf file was normalized in a previous step so the ALT column was split, but fields like AD remained as they were because those fields were was ignored, and their data types were still wrong. fixing the upstream implementation of bcftools norm worked for me and now both my ALT and AD fields are split as i expect them.

ADD REPLY

Login before adding your answer.

Traffic: 1779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6