Your experience with those tools used for comparing variants is why I gave up using them many years ago. They rarely agree and many bugs have been found. 'Under the hood', they may be doing some form of automatic filtering that is not reported, such as not including variants with low quality, etc.
I would just do this manually by(for each file):
1, Normalising the variants in each file (left-aligning indels and splitting multi-allelic calls), and then creating a unique key to identify each variant, setting it in the ID field for each resulting file (BCF):
#1st pipe, splits multi-allelic calls into separate variant calls
#2nd pipe, left-aligns indels and issues warnings when the REF base in your VCF does not match the base in the supplied FASTA reference genome
#3rd pipe, sets VCF ID field to CHROM:POS:REF:ALT
bcftools norm -m-any MyVariants.vcf | bcftools norm --check-ref w -f human_g1k_v37.fasta | bcftools annotate -Ob -x 'ID' -I +'%CHROM:%POS:%REF:%ALT' > MyVariants.bcf ;
bcftools index MyVariants.bcf ;
2, extracting the variant key into 3 separate lists using cut, awk, sed, or grep:
bcftools view MyVariants.bcf | cut -f3 | grep -v "^##" | grep -v "^ID"
3, With your 3 lists of unique variants, use something like Venny to check level of overlap and unique variants in each