the problem with your vcf is not just that there are duplicate values for some INFO field, but in the header there is also defined that these fields only hold 1 entry.
##INFO=<ID=CNT,Number=1,Type=Integer,Description="How many samples have this mutation">
The "Number" defines how many values are allowed. For more information see the manual.
In your example there are not just duplicates. Look at this:
This entry has two different values, but only one is allowed.
Here's a little python script which iterates over all records on your vcf and truncate all INFO fields to the number given in the header.
Save the code as
fixDuplicates.py and run it like this:
$ python fixDuplicates.py prueba.vcf > prueba_corrected.vcf
The script makes use of pysam. You have to install this package first.
modified 3 months ago
3 months ago by
finswimmer ♦ 4.5k