I'm trying to calculate the transition/transversion (Ts/Tv) ratio for my callset. Simple, right? However, I'm frustrated by the fact that two common vcf parsing tools (vcftools and SnpSift) give different results for this metric. Consider the following simple case:
8 variants: 4 Ts, 4 Tv. However, one of the Ts's and two of the Tv's are called homozygous. The rest are called hets.
Vcftools ignores the genotype information. Ts count: 4. Tv count: 4. Ts/Tv ratio = 1.
SnpSift counts homozygous variants twice. Ts count: 5. Tv count: 6. Ts/Tv ratio = 0.83
My feeling is that the genotype information shouldn't matter, since we want to count independent mutagenic events, and the two ALT alleles in a homozygous variant likely have a single mutagenic origin (given the slow rate of SNP evolution and the rarity of multiallelic SNPs). However, I've read a dozen papers on the subject and can't find a simple formula for calculating Ts/Tv in any of them. Is there any consensus on this?