3 months ago by
Normalization such as left-aligning indels can help for simple cases, but these are not the state of the art for VCF comparison. This is a clear case where you should be using a haplotype-aware VCF comparison tool. As well as being able to deal with situations where an indel is placed at a different start position, these tools can also deal with more complex cases that arise (for example when you have SNPs and indels in close proximity.
I would recommend RTG Tools vcfeval or Illumina's hap.py tool depending on what kind of results you are after. Using vcfeval directly is good if you are wanting to do VCF intersection type operations to find variants in common or only in one of the two call sets. hap.py is a good tool if you are more interested in performance metrics and benchmarking, stratified by region or variant type (and you can use vcfeval as the matching engine inside hap.py for slightly improved comparisons than the built-in haplotype matching).
(disclaimer: I work for RTG)