Hi everyone, I have a next-generation sequencing data set of two haplotypes. One is of 95% and the other one is only 5% of the sequencing data. I used assembly methods to get the dominant haploytpe, and then I mapped the raw reads to the assembled contigs. Since the sequence similarity between two haplotypes is high, I am wondering how can I tell the mismatches between reads and the references coming from sequencing errors or the minor haplotype?
Of course the sequencing error rate and the sequence difference between two haplotypes are different. In addition, sequencing errors tend to be random. However, I do need a probability or statistics model to model this problem and figure out a theoretical sound solution. For example, if multiple reads mapped have the same mismatch on the same position of the reference, this mismatch highly possible comes from the minor haplotype. A hypothesis testing method?