I am performing amplicon sequencing of the human genome, which just amplifies up regions of the genome and sequences them. Everything seems to work just fine, but I observe a phenomenon that I am unable to rationalize.
If I stratify variants by base change, (ie separately bin A -> T changes, A -> G changes, etc.) I find that complementary base change frequencies do not always match. For instance it would be expected that since DNA is double stranded, roughly every time an A -> G change is observed the complementary T -> C change should be observed.
This is almost always the case, but I do get repeatable strong mismatches in certain base mutation frequencies. For example something like 1000 observed A -> G variants but only 10 observed T -> C variants.
Biologically this does not make sense to me. Is there something that can account for this phenomenon?
@WouterDeCoster this explains why C -> T and its complementary change of G -> A would be more prevalent in the data. But those two changes should be more or less equally observed in frequency. What I am seeing for some bases is the complementary bases not matching in frequency. So for this example it would be like G -> A is observed far more often than C -> T. Make sense?