10 months ago

I am trying to index a vcf file using igvtools. For some reason, I am getting the following error.

Error: htsjdk.tribble.TribbleExpection: The provided VCF file is malformed at approximately line number 5880: Duplicate allele added to VariantContext: GT


When I got to the specific line it looks like the vcf has the reference duplicated in the alteration column. Here is what it looks like

1   19723050    rs9004957   GT  G,GT    .   .   RSPOS=19617712;RV;dbSNPBuildID=118;SAO=0;VC=in-del;VLD;VP=050000000005000100000200


When I go into the vcf and fix the line by removing the extra GT in this case, then I get another error about the same issue but just thousands of lines later in the VCF. If this happened just a couple of times I would just manually fix them but there are too many occurrences to do that in this case. I was wondering if there was a way to fix this?

10 months ago
awk -F '\t' '/^#/ {print;next;} {OFS="\t";R=$4;n=split($5,a,/[,]/);s="";for(i=1;i<=n;i++) {s=sprintf("%s%s%s%s",s,(i==1?"":","),a[i],a[i]==R?"AAAAAAAAA":"");} $5=s; print;}' < input.vcf  ADD COMMENT 0 Entering edit mode That worked like a charm. I change it a bit to create a new file. Here is what I did for anyone else that encounters this error awk -F '\t' '/^#/ {print;next;} {OFS="\t";R=$4;n=split($5,a,/[,]/);s="";for(i=1;i<=n;i++) {s=sprintf("%s%s%s%s",s,(i==1?"":","),a[i],a[i]==R?"AAAAAAAAA":"");}$5=s; print;}' old.vcf > new.vcf

change is you just added old.vcf > new.vcf to the code

17 days ago
Sam • 0

It's easier to use vcftools.

vcftools --remove-indels --recode --recode-INFO-all --vcf old.vcf --stdout >new.snp.vcf