I'm trying to estimate the sequencing error rates (mismatches and indels) from a bam file.
I can get the number and length of insertions and deletions from the cigar string by counting the number and length of the "I" and "D" values.
How do I calculate the number of mismatches without going to the reference fasta file? We can assume the MD tag is present, but I haven't figured out how to actually parse it properly.
(I'm using python with pysam, if someone has example code somewhere.)