Split multiallelic SNPs to biallelic from vcf
1
3
Entering edit mode
3.1 years ago
User000 ▴ 690

Dear all,

I have a particular vcf file like this,

chrX    29  .   G   A,T .   PASS    AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.003,0.001:0.002,0.0:0.005,0.003:14;0,4;2

I tried various tools to split this, but I get the following results, so the FORMAT and INFO lines are identical.

chrX    29  .   G   A   .   PASS    AC=1,1;AN=3;OLD_MULTIALLELIC=chrM:899:G/A/T GT:DP:HF:CILOW:CIUP:SDP 0/1/.:4839:0.003,0.001:0.002,0:0.005,0.003:14;0,4;2
chrX    29  .   G   T   .   PASS    AC=1,1;AN=3;OLD_MULTIALLELIC=chrM:899:G/A/T GT:DP:HF:CILOW:CIUP:SDP 0/./1:4839:0.003,0.001:0.002,0:0.005,0.003:14;0,4;2

I need to get HF values split accordingly as well, like this:

chrX    29  .   G   A   .   PASS    AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.003:0.002,0.0:0.005,0.003:14;0,4;2
chrX    29  .   G   T   .   PASS    AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.001:0.002,0.0:0.005,0.003:14;0,4;2

How to do this?

EDIT: example.vcf

##fileformat=VCFv4.0
##reference=chrX
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Reads covering the REF position">
##FORMAT=<ID=HF,Number=.,Type=Float,Description="Heteroplasmy Frequency of variant allele">
##FORMAT=<ID=CILOW,Number=.,Type=Float,Description="Value defining the lower limit of the confidence interval of the heteroplasmy fraction">
##FORMAT=<ID=CIUP,Number=.,Type=Float,Description="Value defining the upper limit of the confidence interval of the heteroplasmy fraction">
##FORMAT=<ID=SDP,Number=.,Type=String,Description="Strand-specific read depth of the ALT allele">
##INFO=<ID=AC,Number=1,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chrX    886 .   C   A   .   PASS    AC=1;AN=2   GT:DP:HF:CILOW:CIUP:SDP 0/1:4813:0.002:0.001:0.004:9;0
chrX    895 .   C   A   .   PASS    AC=1;AN=2   GT:DP:HF:CILOW:CIUP:SDP 0/1:4813:0.001:0.0:0.003:5;0
chrX    898 .   C   A   .   PASS    AC=1;AN=2   GT:DP:HF:CILOW:CIUP:SDP 0/1:4829:0.001:0.0:0.002:4;1
chrX    899 .   G   A,T .   PASS    AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.003,0.001:0.002,0.0:0.005,0.003:14;0,4;2
chrX    900 .   C   T   .   PASS    AC=1;AN=2   GT:DP:HF:CILOW:CIUP:SDP 0/1:4838:0.001:0.0:0.003:6;0
chrX    904 .   C   A   .   PASS    AC=1;AN=2   GT:DP:HF:CILOW:CIUP:SDP 0/1:4860:0.004:0.003:0.006:14;6
chrX    909 .   G   T   .   PASS    AC=1;AN=2   GT:DP:HF:CILOW:CIUP:SDP 0/1:4973:0.002:0.001:0.004:7;4
chrX    916 .   C   A   .   PASS    AC=1;AN=2   GT:DP:HF:CILOW:CIUP:SDP 0/1:4998:0.001:0.0:0.002:4;1
chrX    917 .   C   A   .   PASS    AC=1;AN=2   GT:DP:HF:CILOW:CIUP:SDP 0/1:4962:0.001:0.0:0.002:4;1
chrX    932 .   C   T   .   PASS    AC=1;AN=2   GT:DP:HF:CILOW:CIUP:SDP 0/1:4884:0.001:0.0:0.003:4;2
chrX    933 .   G   A,T .   PASS    AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4863:0.001,0.001:0.0,0.001:0.002,0.003:5;0,3;4
python R bash vcf • 6.3k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I need to get HF values split accordingly as well, like this:

chrX 29 . G A . PASS AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.003:0.002,0.0:0.005,0.003:14;0,4;2

and that would be wrong because AC should have only one value and FORMAT/GT can't be '0/1/2'....

I tried various tools to split this,

which one ? did you try bcftools norm ?

ADD REPLY
0
Entering edit mode

It is OK if the tool would give a split results for GT, what I was meaning is I dont care about GT at all in that case, i need to get HF values split. Yes, I tried bcftools norm, vt decompose, split.py (a custom python script I found here) and some others that I do not remember the names, but none worked so far.

ADD REPLY
0
Entering edit mode

can you try posting example vcf with headers?

ADD REPLY
0
Entering edit mode

I updated the question, thanks a lot

ADD REPLY
0
Entering edit mode

Thanks.

ADD REPLY
5
Entering edit mode
2.2 years ago
DareDevil ★ 4.3k

Try the following with variant normalization

bcftools norm \
-m-any \
--check-ref \
-w \
-f hg38.fasta \
input.vcf \
-o output.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 2699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6