From a bam (or sam) file, by looking at the
MD:Z field, we could identify the position of mismatches. For example, if the cigar string is
45M30N35M with the mismatches occuring at positions
65(C->G), then the MD flag would be
MD:Z:4A59C15 to reflect the mismatched bases at 5th and 65th position. Of course it gets a bit complicated if there are consecutive mismatches and/or deletions followed by mismatches. If you're interested, this post explains it very well.
What I am interested in is, given the cigar string and MD:Z tag, to obtain the position of mismatches in a vector. In this case it would be 5 and 65. I could implement it myself, but I am half-minded about it (due to time restrictions) and was wondering if any of the already existing R-packages (like GenomicRanges and such) have a way of obtaining this info directly. Are there any packages anyone is aware of?
Thank you very much in advance. And I wish the biostars forum a Merry Christmas and a very happy new year!