Error: htsjdk.tribble.TribbleExpection: The provided VCF file is malformed at approximately line number 5880: Duplicate allele added to VariantContext: GT
2
0
Entering edit mode
10 months ago

I am trying to index a vcf file using igvtools. For some reason, I am getting the following error.

Error: htsjdk.tribble.TribbleExpection: The provided VCF file is malformed at approximately line number 5880: Duplicate allele added to VariantContext: GT

When I got to the specific line it looks like the vcf has the reference duplicated in the alteration column. Here is what it looks like

1   19723050    rs9004957   GT  G,GT    .   .   RSPOS=19617712;RV;dbSNPBuildID=118;SAO=0;VC=in-del;VLD;VP=050000000005000100000200

When I go into the vcf and fix the line by removing the extra GT in this case, then I get another error about the same issue but just thousands of lines later in the VCF. If this happened just a couple of times I would just manually fix them but there are too many occurrences to do that in this case. I was wondering if there was a way to fix this?

SNP genome next-gen Assembly • 903 views
ADD COMMENT
3
Entering edit mode
10 months ago
awk -F '\t' '/^#/ {print;next;} {OFS="\t";R=$4;n=split($5,a,/[,]/);s="";for(i=1;i<=n;i++) {s=sprintf("%s%s%s%s",s,(i==1?"":","),a[i],a[i]==R?"AAAAAAAAA":"");} $5=s; print;}' < input.vcf
ADD COMMENT
0
Entering edit mode

That worked like a charm. I change it a bit to create a new file. Here is what I did for anyone else that encounters this error

awk -F '\t' '/^#/ {print;next;} {OFS="\t";R=$4;n=split($5,a,/[,]/);s="";for(i=1;i<=n;i++) {s=sprintf("%s%s%s%s",s,(i==1?"":","),a[i],a[i]==R?"AAAAAAAAA":"");} $5=s; print;}' old.vcf > new.vcf
ADD REPLY
0
Entering edit mode

change is you just added old.vcf > new.vcf to the code

ADD REPLY
0
Entering edit mode
17 days ago
Sam • 0

It's easier to use vcftools.

vcftools --remove-indels --recode --recode-INFO-all --vcf old.vcf --stdout >new.snp.vcf

ADD COMMENT

Login before adding your answer.

Traffic: 2153 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6