**0**wrote:

The number of SNPs and the sum of transitions+transversions does not match in the snpEff CSV output. Has anyone ever encountered this before? I ran snpEff on a sample called "B86097":

```
snpEff \
-Xms750m \
-csvStats B86097-effect-stats.csv \
GRCh37.75 B86097.varscan_cns.threshold_0.01.vcf \
> B86097.snpEff.vcf
```

The CSV output shows 109 SNPs:

```
# Variantss by type
Type , Count , Percent
DEL , 13 , 10.4%
INS , 3 , 2.4%
SNP , 109 , 87.2%
```

But the Ts/Tv summaries show 85 transitions and 47 transversions, which adds to 132:

```
# Ts/Tv summary
Transitions , 85
Transversions , 47
Ts_Tv_ratio , 1.808511
# Ts/Tv : All variants
Sample ,Sample1,Total
Transitions ,85,85
Transversions ,47,47
Ts/Tv ,1.809,1.809
```

Notably, the sum of changes in the "Base changes matrix" adds up to 109:

```
# Base changes
base , A , C , G , T
A , 0 , 5 , 24 , 4
C , 3 , 0 , 10 , 15
G , 18 , 5 , 0 , 4
T , 4 , 11 , 6 , 0
```

This discrepancy has been the case for all samples that I've examined so far. I investigated this issue by running snpEff on each variant individually. I found 23 variants which are counted as either 2 transitions or 2 transversions, which would put the sum of transitions and transversions at 132 (109+23 = 132):

```
$grep -A 4 "# Ts/Tv summary" *.csv | grep " 2"
line_102.vcf-effects-stats.csv-Transitions , 2
line_10.vcf-effects-stats.csv-Transversions , 2
line_11.vcf-effects-stats.csv-Transversions , 2
line_12.vcf-effects-stats.csv-Transversions , 2
line_13.vcf-effects-stats.csv-Transitions , 2
line_14.vcf-effects-stats.csv-Transversions , 2
line_15.vcf-effects-stats.csv-Transitions , 2
line_18.vcf-effects-stats.csv-Transversions , 2
line_1.vcf-effects-stats.csv-Transitions , 2
line_29.vcf-effects-stats.csv-Transversions , 2
line_35.vcf-effects-stats.csv-Transitions , 2
line_37.vcf-effects-stats.csv-Transitions , 2
line_42.vcf-effects-stats.csv-Transitions , 2
line_44.vcf-effects-stats.csv-Transitions , 2
line_48.vcf-effects-stats.csv-Transitions , 2
line_59.vcf-effects-stats.csv-Transitions , 2
line_66.vcf-effects-stats.csv-Transitions , 2
line_70.vcf-effects-stats.csv-Transitions , 2
line_72.vcf-effects-stats.csv-Transitions , 2
line_7.vcf-effects-stats.csv-Transitions , 2
line_8.vcf-effects-stats.csv-Transitions , 2
line_90.vcf-effects-stats.csv-Transitions , 2
line_95.vcf-effects-stats.csv-Transitions , 2
```

I suspected that the double-counting might be due to snpEff annotating each of these variants for multiple genes, but this double-annotation was not the case for all of the double-counted variants:

```
$while read i; do echo $i--------------------; grep -v ^# line_$i.vcf.snpEff.vcf | cut -f8 | tr ',' '\n' | cut -d\| -f4 | sort | uniq -c; done < double_counted_variant_numbers.txt
102--------------------
6 ATM
1 C11orf65
10--------------------
17 TP53
11--------------------
3 SPEN
12 ZBTB17
12--------------------
25 TP53
13--------------------
25 TP53
14--------------------
17 TP53
15--------------------
6 CD79B
18--------------------
2 CTD-2369P2.2
2 DNMT1
1 S1PR2
1--------------------
16 IL4R
29--------------------
1 BCL10
2 RP11-131L23.1
35--------------------
9 FCGR2B
1 RP11-25K21.1
37--------------------
5 RP3-395M20.8
15 TNFRSF14
42--------------------
6 SETD2
1 snoU13
44--------------------
5 SETD2
48--------------------
1 RP3-395M20.7
16 TNFRSF14
59--------------------
1 SPEN
66--------------------
3 RP1-234P15.4
3 TMEM30A
70--------------------
1 SPEN
72--------------------
3 CARD11
7--------------------
4 PLCG2
8--------------------
4 PLCG2
90--------------------
1 NOTCH1
95--------------------
19 FAS
1 RP11-399O19.9
```

Any idea why this might be happening? Let me know if you need more information! Thanks so much.