The reference VCF file that available to download at http://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/ seems to have format issue.
This is happen when I try to run GATK BaseRecalibrator command. The error show that there is a duplicate allele added to VariantContext. I start to looks at the reference VCF file and found that there is duplicate Alternative Allele and also sometimes REF is showing in ALT column. I download vcftools to validate and get following waring
[tmp]$ vcf-validator homo_sapiens_GRCh37.vcf 1:2886090 .. REF allele listed in the ALT field?? 1:4095845 .. Could not parse the allele(s) [AG], first base does not match the reference. 1:8121167 .. The alleles not unique: CAAT 1:8121167 .. The alleles not unique: CAAT 1:9127042 .. The alleles not unique: TAA 1:11408760 .. The alleles not unique: CTATGTATG 1:13177471 .. The alleles not unique: CTT 1:13894414 .. REF allele listed in the ALT field?? 1:15015689 .. REF allele listed in the ALT field??
Is there other source where I can get a better Variation Sets file?