Low transition/transversion ratio: alignment or caller problem?
1
3
Entering edit mode
9.8 years ago
DoubleD ▴ 130

Hello,

After running Varscan and Mutect on a set of 10 patients (tumor / normal comparison), I have run through a pipeline of false-positive filtering. When I look at my resulting Ts/Tv ratio (by manual calculation, snpEff summary file, SnpSift tstv calculation or GATK VariantEval), it is quite low for human whole genome sequence data (1.3-1.6). I have read all I can find here and in papers about the expected ratio, and how a low ratio could denote a great deal of false positives.

I ran Varscan with relatively lax parameters for calling somatic mutations (5 reads in N, 8 in T, but strand bias filtered), however I thought Mutect would call a confident set. Both SNP callers end up with a low Ts/Tv. My question is, can I chalk this result up to false positives (which is okay with me, I wanted a sensitive not specific call set), or could it be a problem with the BAM alignment? I suppose a poorly aligned BAM would lead to false positives too, but any insight or information would be greatly appreciated.

somatic qc whole-genome vcf • 6.4k views
ADD COMMENT
0
Entering edit mode

Some followup information after talking with a more experienced user; running the TsTv ratio calculation on the germline calls results in 2.06. Hopefully this denotes a properly aligned BAM, and the low ratio with somatic calls come from too sensitive calling parameters (too lax of parameters to call somatic). The ratio on the LOH calls was 2.1, although there were far fewer calls compared to the germline file.

ADD REPLY
3
Entering edit mode
9.8 years ago

We should expect Ts/Tv ratio of somatic point mutations to be wildly variable across tumor types... depending on various mutagens, or the mechanisms involved in DNA repair. I can't seem to find a publication that confirms this assumption, but this figure comes close. Here are my quick and dirty Ts/Tv ratios of mutation calls grabbed from that paper, but please double-check my work.

Note: A caveat in the data below is that some cohorts are exomes while others are whole-genomes. Since there's more GC content in exomes, these Ts/Tv ratios are not perfectly comparable... but good enough for our point to hold.

Cancer Type           Ts/Tv
ALL                   0.949906
AML                   2.128909
Bladder               1.325778
Breast                0.859808
Cervix                1.265049
CLL                   1.006487
Colorectum            2.163191
Esophageal            1.38155
Glioblastoma          3.53876
Glioma Low Grade      2.244252
Head and Neck         1.172555
Kidney Chromophobe    2.545455
Kidney Clear Cell     1.165541
Kidney Papillary      1.116037
Liver                 1.222369
Lung Adeno            0.439277
Lung Small Cell       0.569885
Lung Squamous         0.635106
Lymphoma B-cell       0.971431
Medulloblastoma       1.381825
Melanoma              8.54497
Myeloma               1.303654
Neuroblastoma         0.566366
Ovary                 0.876746
Pancreas              1.021448
Pilocytic Astrocytoma 1.837178
Prostate              1.220668
Stomach               3.006267
Thyroid               2.161623
Uterus                1.632635
ADD COMMENT
1
Entering edit mode

This is very helpful, thank you Cyriac. Using the data found at ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl/somatic_mutation_data/Liver/ I got a TsTv of 1.222 for the 850734 SNPs in that project.

For my dataset, calculating a TsTv on the germline mutations gives a result of 2.1, denoting mutation without selection, but the somatic TsTv of 1.3 to 1.5 would denote a selective mutation pressure.

ADD REPLY

Login before adding your answer.

Traffic: 2898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6