snpEff: Single variants are double-counted in Ts/Tv summary
0
0
Entering edit mode
4.1 years ago

The number of SNPs and the sum of transitions+transversions does not match in the snpEff CSV output. Has anyone ever encountered this before? I ran snpEff on a sample called "B86097":

snpEff \
    -Xms750m  \
    -csvStats B86097-effect-stats.csv \
    GRCh37.75 B86097.varscan_cns.threshold_0.01.vcf \
    > B86097.snpEff.vcf

The CSV output shows 109 SNPs:

# Variantss by type 
Type , Count , Percent  
DEL , 13 , 10.4%  
INS , 3 , 2.4%  
SNP , 109 , 87.2%

But the Ts/Tv summaries show 85 transitions and 47 transversions, which adds to 132:

# Ts/Tv summary

Transitions , 85
Transversions , 47
Ts_Tv_ratio , 1.808511

# Ts/Tv : All variants

Sample ,Sample1,Total
Transitions ,85,85
Transversions ,47,47
Ts/Tv ,1.809,1.809

Notably, the sum of changes in the "Base changes matrix" adds up to 109:

# Base changes

base  , A  , C  , G  , T 
 A  , 0  , 5  , 24  , 4 
 C  , 3  , 0  , 10  , 15 
 G  , 18  , 5  , 0  , 4 
 T  , 4  , 11  , 6  , 0

This discrepancy has been the case for all samples that I've examined so far. I investigated this issue by running snpEff on each variant individually. I found 23 variants which are counted as either 2 transitions or 2 transversions, which would put the sum of transitions and transversions at 132 (109+23 = 132):

$grep -A 4 "# Ts/Tv summary" *.csv | grep " 2"
line_102.vcf-effects-stats.csv-Transitions , 2
line_10.vcf-effects-stats.csv-Transversions , 2
line_11.vcf-effects-stats.csv-Transversions , 2
line_12.vcf-effects-stats.csv-Transversions , 2
line_13.vcf-effects-stats.csv-Transitions , 2
line_14.vcf-effects-stats.csv-Transversions , 2
line_15.vcf-effects-stats.csv-Transitions , 2
line_18.vcf-effects-stats.csv-Transversions , 2
line_1.vcf-effects-stats.csv-Transitions , 2
line_29.vcf-effects-stats.csv-Transversions , 2
line_35.vcf-effects-stats.csv-Transitions , 2
line_37.vcf-effects-stats.csv-Transitions , 2
line_42.vcf-effects-stats.csv-Transitions , 2
line_44.vcf-effects-stats.csv-Transitions , 2
line_48.vcf-effects-stats.csv-Transitions , 2
line_59.vcf-effects-stats.csv-Transitions , 2
line_66.vcf-effects-stats.csv-Transitions , 2
line_70.vcf-effects-stats.csv-Transitions , 2
line_72.vcf-effects-stats.csv-Transitions , 2
line_7.vcf-effects-stats.csv-Transitions , 2
line_8.vcf-effects-stats.csv-Transitions , 2
line_90.vcf-effects-stats.csv-Transitions , 2
line_95.vcf-effects-stats.csv-Transitions , 2

I suspected that the double-counting might be due to snpEff annotating each of these variants for multiple genes, but this double-annotation was not the case for all of the double-counted variants:

$while read i; do echo $i--------------------; grep -v ^# line_$i.vcf.snpEff.vcf | cut -f8 | tr ',' '\n' | cut -d\| -f4 | sort | uniq -c; done < double_counted_variant_numbers.txt 
102--------------------
      6 ATM
      1 C11orf65
10--------------------
     17 TP53
11--------------------
      3 SPEN
     12 ZBTB17
12--------------------
     25 TP53
13--------------------
     25 TP53
14--------------------
     17 TP53
15--------------------
      6 CD79B
18--------------------
      2 CTD-2369P2.2
      2 DNMT1
      1 S1PR2
1--------------------
     16 IL4R
29--------------------
      1 BCL10
      2 RP11-131L23.1
35--------------------
      9 FCGR2B
      1 RP11-25K21.1
37--------------------
      5 RP3-395M20.8
     15 TNFRSF14
42--------------------
      6 SETD2
      1 snoU13
44--------------------
      5 SETD2
48--------------------
      1 RP3-395M20.7
     16 TNFRSF14
59--------------------
      1 SPEN
66--------------------
      3 RP1-234P15.4
      3 TMEM30A
70--------------------
      1 SPEN
72--------------------
      3 CARD11
7--------------------
      4 PLCG2
8--------------------
      4 PLCG2
90--------------------
      1 NOTCH1
95--------------------
     19 FAS
      1 RP11-399O19.9

Any idea why this might be happening? Let me know if you need more information! Thanks so much.

snpEff Ts/Tv ratio Ti/Tv ratio • 1.3k views
ADD COMMENT
0
Entering edit mode

Hello, did you find the answer to this? It would be helpful to know what your thoughts are now. Thank you

ADD REPLY

Login before adding your answer.

Traffic: 2842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6