bcftools multiallelic split not working
1
1
Entering edit mode
19 months ago

I am attempting to split multiallelic sites using bcftools norm with the following command:

zcat ${inputVcf} | \ sed 's/AD,Number=./AD,Number=R/g' | \ sed 's/ADR,Number=./ADR,Number=R/g' | \ sed 's/ADF,Number=./ADF,Number=R/g' | \ bcftools norm \ --fasta-ref${genomeFa} \
--check-ref s \
--multiallelics -any \
--output \${outputVcf}


The sed commands were based on the recommendation from here. However I'm still getting FORMAT entries such as the following: GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL 1/0:44:44:56:1,10,5:1,4,2:0,6,3:PASS:511,99,48 ./.:.:.:.:.:.:.:.:. 0/1:53:53:63:0,12,6:0,4,1:0,8,5:PASS:483,210,164 which are clearly multiallelic. Anybody know how to fix this?

bcftools vcf • 2.4k views
3
Entering edit mode
19 months ago

Hi, I think that you misinterpret what is a 'multi-allelic' call. The entry that you posted is not multi-allelic in this sense. A multi-allelic call may look like:

A      G,T    1/2


Thus, the genotype is GT. After splitting, this would become:

A      G      0/1
A      T      0/1


Kevin

1
Entering edit mode

i think that clarifies things and pointed me in the right direction. what happened was, the vcf file was normalized in a previous step so the ALT column was split, but fields like AD remained as they were because those fields were was ignored, and their data types were still wrong. fixing the upstream implementation of bcftools norm worked for me and now both my ALT and AD fields are split as i expect them.

0
Entering edit mode

How can i achieve that you discribed above for a VCF file ?

0
Entering edit mode
bcftools norm -m-any


If you want to additionally left-align indels, then supply a FASTA reference:

bcftools norm -m-any --check-ref w -f human_g1k_v37.fasta


Take a look at my Step 4, here: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

0
Entering edit mode

I had a vcf file only contains snp variants (bi and multi) after GATK VQSR , now I want to split multiallelic variant into biallelic variant, the order I used is : bcftools norm -m -snps snp.2.vcf.gz -Ov -o output then it throw an error: Error: wrong number of fields in INFO/MLEAC at 2:10443, expected 2, found 1 how can i solve it?

0
Entering edit mode

first perform bcftools norm -m-any then VQSR