Question: Variant calling using IonTorrent caller correcting for deamination
0
gravatar for graeme.thorn
7 weeks ago by
graeme.thorn40
London, United Kingdom
graeme.thorn40 wrote:

I'm calling variants using the IonTorrent TorrentSuite on DNA which has been sequenced from formalin-fixed paraffin-embedded tissue. This has a major issue in that without addition of uracil-N-glycosylase, some of the Ts in the original DNA are deaminated to Cs, which upon sequencing and calling variants can show up as mutations, either as T>C transitions or G>A (from the opposite strand, due to PCR in the library prep). I do not have any idea how long these samples were stored without UNG before sequencing.

TVC (Torrent Variant Caller) gives a deamination metric (essentially, sum of T>C and G>A variants over all variants called), and for our samples, the highest value seen is ~0.92. Naively postprocessing the variants show that for these samples, C>T/T>C transitions ( [1]: https://ibb.co/KjPkkT3) overwhelm the remaining variants among my samples.

My question is this, given the IonTorrent variant calling pipeline (sequencing > BAM file > TVC > VCF file with deamination statistic), is there:

a) a way of correcting the output VCF, or b) a set of filters to use in bcftools,

to reduce this effect on the samples?

My use case is this: these are medical samples, which have been inspected by a pathologist (hence the FFPE treatment), and I want to determine which variants are predictive* of outcome, hence I have two potentially contradictory goals: reduce false positives and capture the rarer variants which may hold predictive power.

variants iontorrent • 108 views
ADD COMMENTlink modified 6 weeks ago • written 7 weeks ago by graeme.thorn40

how bad would it be to restrict the analysis to transversions?

ADD REPLYlink written 6 weeks ago by Gabriel R.2.6k

That's one possible strategy, though this could throw out lots of important information with regards to the clinical outcomes. The other strategy that I'm playing with is to cluster the called variants on quality and on allele fraction. [C>T] and [G>A] transitions due to the DNA damage are likely to be randomly located throughout the DNA, so the reads with these transitions won't pile up as much as true mutations, and hence have low quality and a low percentage occurrence. If this is true, then [C>T] and [G>A] transitions due to deamination of the original DNA may form a cluster separate to the true positive mutations, enabling a filter. I just need to aggregate all my data together and analyse the stats to see if this second strategy could work.

ADD REPLYlink written 6 weeks ago by graeme.thorn40

can I ask you, could you plot the damage? Also, could you post a bit of the data somewhere and msg me, I have never had my hands on IonTorrent aDNA before.

ADD REPLYlink written 6 weeks ago by Gabriel R.2.6k
0
gravatar for graeme.thorn
6 weeks ago by
graeme.thorn40
London, United Kingdom
graeme.thorn40 wrote:

I think, Gabriel R., that I've found the way to deal with this. On inspecting one of my samples (with estimated deamination metric, the proportion of called SNPs consisting of C>T or G>A transitions, of 0.837), this looks like a straightforward filtering procedure. Looking at the minor allele fraction (AF) and the quality score (-10 log10(p-value)) per type of SNP:AF v QUAL per SNP type

it can be seen that the C>T and G>A transitions likely caused by deamination (and PCRing up the deaminated DNA to get the libraries) all cluster in the bottom right of the plot - low AF and low QUAL, so setting thresholds to the left (higher QUAL) and above (higher AF) the groups will remove these from the called variants.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by graeme.thorn40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1175 users visited in the last hour