Multiallelic variants when merging VCF's with GLnexus
0
1
Entering edit mode
4 months ago

I'm attempting to combine around 140 .g.vcf files into a single file using GLnexus on the DNAnexus platform. To examine multiallelic variants, I'm normalizing the files using the bcftools norm -m-any $file command. While merging the original VCF files (generated with GATK) poses no problem, the normalized VCF files result in no detected variants. Can you provide insights into why this might be happening and suggest possible solutions? Additionally, what happens to multiallelic variants when merging without normalization?

GLnexus multiallelic-variants vcf • 715 views
ADD COMMENT
0
Entering edit mode

show us some lines of the normalized files

ADD REPLY
0
Entering edit mode

chr1 270839 . C <NON_REF> . . END=270845 GT:DP:GQ:MIN_DP:PL 0/0:1:3:1:0,3,35
chr1 270846 . C <NON_REF> . . END=270846 GT:DP:GQ:MIN_DP:PL 0/0:1:0:1:0,0,0 chr1 270847 . G <NON_REF> . . END=270847 GT:DP:GQ:MIN_DP:PL 0/0:2:6:2:0,6,70 chr1 270848 . C <NON_REF> . . END=270848 GT:DP:GQ:MIN_DP:PL 0/0:2:0:2:0,0,0

here is example of some lines from the normalized VCF. I checked and there are lines that the 4th column is not <non_ref> but didn't mange to copy them.

ADD REPLY
0
Entering edit mode

Can you find a line that changed after the bcftools step?

ADD REPLY
1
Entering edit mode

I found an example. In the original .g.vcf:

chr1 268559 . A C,<NON_REF> 189.64 . BaseQRankSum=0.389;DP=11;ExcessHet=0.0000;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-2.253;RAW_MQandDP=24733,11;ReadPosRankSum=1.184 GT:AD:DP:GQ:PL:SB 0/1:3,8,0:11:51:197,0,51,206,75,281:1,2,2,6

In the normalized .g.vcf it looks like this:

chr1 268559 . A C 189.64 . BaseQRankSum=0.389;DP=11;ExcessHet=0;MLEAC=1;MLEAF=0.5;MQRankSum=-2.253;RAW_MQandDP=24733,11;ReadPosRankSum=1.184 GT:AD:DP:GQ:PL:SB 0/1:3,8:11:51:197,0,51:1,2,2,6

chr1 268559 . A <NON_REF> 189.64 . BaseQRankSum=0.389;DP=11;ExcessHet=0;MLEAC=0;MLEAF=0;MQRankSum=-2.253;RAW_MQandDP=24733,11;ReadPosRankSum=1.184 GT:AD:DP:GQ:PL:SB 0/0:3,0:11:51:197,206,281:1,2,2,6

ADD REPLY
0
Entering edit mode

oooh that might be a bug - it's treating the <NON_REF> sentinel as kind of an allele change. Really that line should not be touched unless it was truly multiallelic (e.g. A->C, A->G), and any reference range should include an END.

Can you file an issue here https://github.com/samtools/bcftools/issues and include the example above and the version number of bcftools?

Good catch.

ADD REPLY

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6