**1.5k**wrote:

This is a kind of follow-up inspired by the very good question/answers I read in "How to calculate sensitivity/selectivity of an algorithm that returns locations of possible matches?"

My goal is to evaluate the Sensitivity/Specificity of an indel detection method.

I have a "gold standard" VCF file (ref.vcf) that states where are exactly the insertions and deletions in my genome. And of course, my indel detection method produces its own VCF file (let's call it test.vcf).

To calculate the True Positives, I detect the intersection of test.vcf and ref.vcf (I use exact intersection for the sake of simplicity for now). The False Positives, are the features in test.vcf that are not in ref.vcf. And False Negatives are the features in ref.vcf that are not in test.vcf.

But how would you calculate the True Negatives? I just can't use the number of positions left (too big number!).

Why is the number too big? From my understanding, you have a number of positions that say "nope, no indel here," which is probably the majority of them. For these positions, if there really

isn'tan indel there, shouldn't that be a true negative? Assuming similar data, you should have mostly true negatives.490Pascal is correct, the whole number of TN is too large (~3.3e9 for human) such that the figures will be misleading (and drown in rounding error). Therefore it is common practice not to use the standard way of defining specificity like that.

47k