I have a vcf file containing variant data for 52 samples.
- sample 1
- sample 2
- sample 3
- etc., etc.
What I would like to do is perform pairwise comparisons where I count the number of variants (SNPs and small INDELs) between each sample and each other sample.
- number of variants between sample 1 and sample 2
- number of variants between sample 1 and sample 3
- and so on for every pairwise comparison possible.
I'm not looking to count the number of variants across all samples, nor the number of variants between each sample and the reference assembly, as I already have these.
I had been hoping that VCFTools would have a function for this, but from checking the manual, it seems not? If I have missed something in VCFTools, please let me know. Otherwise, I would really appreciate links to python, perl or bash scripts that can do what I need, or recommendations for other software that might help.
Many many thanks in advance.