Hi,
I'm working with some SAM files generated by minimap2. The reads have been aligned to a spike-in reference so I'd like to use the alignment scores from the BAM file to get an estimate of the raw error rate introduced during sequencing. My understanding was that the maximum alignment score (AS) was by definition no greater than the length of the query sequence, but I can see several cases in my BAM file where it is higher (and hence I'm getting nonsense like 135% match).
Is there a simple formula for the alignment score (ie. that could be re-computed from the CIGAR string) or does it come from the internal machinations of the minimap2 alignment algorithm? For now I'm going to fall back to summing the number of M's in the CIGAR, ie in pysam:
my_alignment_score = sum( t[1] for t in read.cigartuples if t[0] == 0 )
But I'm still wondering what the AS tag is trying to tell me?
Cheers - TIM