Question: Mismatched frequencies of complementary bases after sequencing
3.5 years ago
L. A. Liggett

L. A. Liggett wrote:

I am performing amplicon sequencing of the human genome, which just amplifies up regions of the genome and sequences them. Everything seems to work just fine, but I observe a phenomenon that I am unable to rationalize.

If I stratify variants by base change, (ie separately bin A -> T changes, A -> G changes, etc.) I find that complementary base change frequencies do not always match. For instance it would be expected that since DNA is double stranded, roughly every time an A -> G change is observed the complementary T -> C change should be observed.

This is almost always the case, but I do get repeatable strong mismatches in certain base mutation frequencies. For example something like 1000 observed A -> G variants but only 10 observed T -> C variants.

Biologically this does not make sense to me. Is there something that can account for this phenomenon?

3.5 years ago
WouterDeCoster wrote:

Dependent on the structure of your nucleotide base:

Depending on whether your example was random or real, this is a partial answer.

In addition, most common mutation is methyl cytosine to T (oxidative deamination)

@WouterDeCoster this explains why C -> T and its complementary change of G -> A would be more prevalent in the data. But those two changes should be more or less equally observed in frequency. What I am seeing for some bases is the complementary bases not matching in frequency. So for this example it would be like G -> A is observed far more often than C -> T. Make sense?

