How do people calculate genotype concordance between INDEL call sets?
I have a NGS and BAC based INDEL call set (in vcf) but I get a higher false positive and false negative rate than mentioned in some papers that used similar data and variant call tools. (those papers of course don't mention how they exactly calculated the genotype concordance for INDELS ).
When I inspect the discordant calls in IGV a lot are in or close to repeat regions and some NGS and BAC based INDEL calls are very close to each other, or even overlap.
So far I have used GATK GenotypeCorcordance, which only considers an exact position match and allele match as a true positive.
Should I consider INDELS that match position but have a different allele (lenght) as a true positive match?
Should I exclude INDELS within repeat regions from the genotype concordance calculation? Is there a way to normalize INDEL calls within repetitive regions (ie put them all in the same spot in the repetitive region)?
Should I exclude INDELS within flanks of repeat regions (1 bp?, 5 bp?, 10 bp? ) from the genotype concordance calculation?
Should I exclude INDELS within INDEL clusters (2 indel calls in 10 bp?) from the genotype concordance calculation?
Should I consider INDELS have some overlap (1 bp, 5 bp, 10 bp ) as a true positive match?
Should I consider INDELS that are close to each other (1 bp, 5 bp, 10 bp ) as a true positive match?
Should I only look at INDELS up to 10 bp?
Is there a tool other than GATK GenotypeCorcordance that I can use to calculate the genotype concordance between 2 INDEL call sets ( which maybe also regards the things mentioned above) ?