Variant Calling when Genetic Distance between Reference and Samples is Greater than it is for Humans: Does GATK still perform well?
I need to do some variant calling on 50 samples. The reference is from the same species but a different population. In terms of genetic distance, the MASH distance between the samples and reference ranges from 0.006 - 0.007 whereas the distance between samples is around 0.002. For reference, the distance between any two humans is supposed to be around 0.001. So my samples diverge quite a bit from the reference compared to humans. But they aren't as divergent as what you might get from bacteria isolates. For the latter, I know that GATK does not work very well and the community uses other tools like SNIPPY which is based on bwa mem/freebayes pipeline. I was wondering whether GATK when perform well in my case or whether there's better alternatives. How divergent does the reference need to be from the samples before GATK starts under performing other callers? thanks - Robert

