can you confirm this:
one of my bam was aligned with an 'old' version of BWA:
@PG ID:bwa PN:bwa VN:0.6.2-r126
some reads length=100 are said to have a cigar string 100M and an edit distance of NM:i:2
HWI-1KL149:59:C2AVTACXX:5:2209:15195:25860 147 1 12106 0 100M = 11987 -219 TGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGAGAGCATAT JHGEIK7KLKKJL7JFIHILGE@@?CCBBDBABC?BBCABBABAAB@AF?BB@BB?BB??BABB???BCCCBBAC?AA>BBBBBBGECADDG==6)'#%# X0:i:6 X1:i:1 BD:Z:NINNPONMMMNONNNMNNPOOOKKOPMMMMONPOONOMNLMNMLLKNNMONMNMMOONNONNLMIIMLDLLMLLNMMMMLDLLLNNMLKNMLMMLHMJKK MD:Z:98C0A0 PG:Z:MarkDuplicates.13 RG:Z:p55 XG:i:0 BI:Z:RNQRRRPPNQQRSQQRRQSSRSNNRSPRRPRQSRQSSPSPOSRPPPRRRRQQRNRRQSQQSQQRNOQPKPQQRRRRQRQPKPQQRRQRPRQQQRPQQPPP AM:i:0 NM:i:2 SM:i:0 XM:i:2 XO:i:0 MQ:i:0 XT:A:R
furthermore blat said that two bases in 3' shoudl be soft-clipped (alignment goes from 1 to 98 instead of 1 to 100):
00001 tgggccattgttcatcttctggcccctgttgtctgcatgtaacttaatac 00050
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
12106 tgggccattgttcatcttctggcccctgttgtctgcatgtaacttaatac 12155
00051 cacaaccaggcataggggaaagattggaggaaagatgagtgagagcat 00098
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
12156 cacaaccaggcataggggaaagattggaggaaagatgagtgagagcat 12203
So I would have expected that the cigar string would have been 98M2S instead of 100M.
Can you confirm this ? Is it a 'feature' of bwa or is it fixed in the latest version of bwa ?
but bwa is a local alignment isn't it ? so scoring/mismatch at 5' or/and 3' makes no sense to me.
no it is a semi-global aligner, it tries as much as it can to align the entire read.