Hi biostars community! I've been struggling with this question lately. So the titv ratio is normally used as a measure to assess whether the quality control on WES or WGS data was performed correctly and normally this value should be around 2.8-3.0 for WES (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4308666/) and around 2.0 -2.1 for WGS in human data. I have some data in which I've been doing QC, it is divided by chromosome and chunks within each chromosome. However when I QC the data the titv ratios do not improve but on the contrary they become worse. I have been using bcftools for this and after running some MWE I finally understood how the ratios are calculated: namely on the variant level (for a global titv) and on the sample level. In the first scenario the sample genotypes are not taking into consideration therefore if there are only nonRef homozygotes for a variant in the whole sample this one will still be counted to calculate the ratio. On the contrary, the sample level tstv does take into account the presence of the alternative allele. For a more comprehensive explanation of this look at the github issue.
So the question would be: when we use ~3 tstv ratio as a proxy for QC, which version of tstv are we referring to? the variant level or the sample level one?
Which is the right way of calculating titv ratio in this case?