Say I have SNP INDEL calls for 1000 individuals. These 1000 samples were joint-called and recalibrated with GATK in 10 batches.
As a result I have 10 VCF files with SNP and INDEL calls that I would like to merge. I only have access to the VCF files, so re-calling from the BAMs is not an option.
I'm familiar with
bcftools but I'm unclear on the best way forward.
Should I split multiallelic entries into biallelic before merging?
If I'm interested in rare variants, should I omit multiallelic variants?
Should I left align before merging? After merging? Or Both?
Thank you for any advice