Old Version Of Bwa And Soft Clipping
1
3
Entering edit mode
10.1 years ago

can you confirm this:

one of my bam was aligned with an 'old' version of BWA:

@PG    ID:bwa    PN:bwa    VN:0.6.2-r126

some reads length=100 are said to have a cigar string 100M and an edit distance of NM:i:2

HWI-1KL149:59:C2AVTACXX:5:2209:15195:25860    147    1    12106    0    100M    =    11987    -219    TGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGAGAGCATAT    JHGEIK7KLKKJL7JFIHILGE@@?CCBBDBABC?BBCABBABAAB@AF?BB@BB?BB??BABB???BCCCBBAC?AA>BBBBBBGECADDG==6)'#%#    X0:i:6    X1:i:1    BD:Z:NINNPONMMMNONNNMNNPOOOKKOPMMMMONPOONOMNLMNMLLKNNMONMNMMOONNONNLMIIMLDLLMLLNMMMMLDLLLNNMLKNMLMMLHMJKK    MD:Z:98C0A0    PG:Z:MarkDuplicates.13    RG:Z:p55    XG:i:0    BI:Z:RNQRRRPPNQQRSQQRRQSSRSNNRSPRRPRQSRQSSPSPOSRPPPRRRRQQRNRRQSQQSQQRNOQPKPQQRRRRQRQPKPQQRRQRPRQQQRPQQPPP    AM:i:0    NM:i:2    SM:i:0    XM:i:2    XO:i:0    MQ:i:0    XT:A:R

furthermore blat said that two bases in 3' shoudl be soft-clipped (alignment goes from 1 to 98 instead of 1 to 100):

00001 tgggccattgttcatcttctggcccctgttgtctgcatgtaacttaatac 00050
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
12106 tgggccattgttcatcttctggcccctgttgtctgcatgtaacttaatac 12155

00051 cacaaccaggcataggggaaagattggaggaaagatgagtgagagcat 00098
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
12156 cacaaccaggcataggggaaagattggaggaaagatgagtgagagcat 12203

So I would have expected that the cigar string would have been 98M2S instead of 100M.

Can you confirm this ? Is it a 'feature' of bwa or is it fixed in the latest version of bwa ?

bwa • 2.6k views
ADD COMMENT
0
Entering edit mode
10.1 years ago

I would say both answers are correct. It all depends on what scoring is.

As long as the penalty for mismatch is smaller than that of clipping you'll get the first answer. Of course now we also have a mismatch towards the end of the read where allowing clipping makes more sense.

ADD COMMENT
0
Entering edit mode

but bwa is a local alignment isn't it ? so scoring/mismatch at 5' or/and 3' makes no sense to me.

ADD REPLY
3
Entering edit mode

no it is a semi-global aligner, it tries as much as it can to align the entire read.

ADD REPLY

Login before adding your answer.

Traffic: 1966 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6