Question: Deconvolve Chimera
7.2 years ago
pagel wrote:

I don't know if I'm using the proper terminology, but I have ab1 (sanger) sequencing chromatograms and I was wondering if there was any software out there that you were aware of to take two overlapped reads (e.g. overlaid indel-type mutations) and give two+ sequences as (fasta) output.

For example, if I have a read that is unambiguously ACTGGCGA but then I have 60% population where I have an A and 40% where that A is deleted, followed by GCGTGA, phred will likely give me ACTGGCGAAGCGTGA, but is there a way for a basecaller to give me ACTGGCGAGCGTGA as a secondary call? Or, if I have 50/50 and I know that one version is ACTGGCGAAGCGTGA, supplying that as a comparison file for subtraction, leaving me with a residual signal of either the full ACTGGCGAGCGTGA or of GCGTGAx?

Obviously I would have to be able to set a threshhold where I consider the result "noise" - either a fixed value of signal strength or e.g. 5% of the main peak strength in order to filter out illegitimate base calls of non-chimeric sequence.

As a separate, but related, issue, is there a way to get phred (or any other program) to "filter" noise spikes in chromatograms? For example, sometimes I see spikes in pyrimidine signal strength that is way out of proportion to legitimate regions of call (the peak height using an ab1 viewer like bioedit or consed - I'm not sure if linear or log scale - is well over double of any nearby base or even any other place in the file. These typically have "width" of about 5 base calls)

7.2 years ago by pagel
7.1 years ago
pagel wrote:

I received an answer for this elsewhere

Mutation Surveyor and polyphred were mentioned as having this capability. I will be trying polyphred.

7.1 years ago by pagel

Thanks for adding your answer here. If you have more details, or want to share your experience after running polyphred, you could add a comment to your answer :-)

written 7.1 years ago by Leonor Palmeira

this didn't quite give me what I needed. I ended up taking the algorithm for "Multiple SeqDoC" ( and modifying it to automatically call secondary peaks (my calls are pretty naive: based on peak height above a threshhold. Sometimes it misses a base that was different than the reference) and output the results both in-image and as a secondary text line. After that, you can create a fake with mktrace or just duplicate the original trace and edit it for the secondary call.

I'm not quite sure where/if to upload my modifications. It is a large improvement (IMHO) over the referenced SeqDoC algorithm, but it needs quite a bit of development to truely be useful to many.

written 6.5 years ago by pagel
