Question

Duplication BWA aligner

0

Entering edit mode

5.4 years ago

Mick ▴ 30

Hi guys,

I have the following two reads:

@read1
TTTTCCTTCTCTTTCTCCTTCTCTTTTTCTGGCAGAAGTTCTAACTCTGGTATTAGCTGACAGATATTTGGAGGTTCTTCTGG
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@read1-mut
TTTTTCCTTCTCTTTCTCCTTCTCTTTTTCTGGCAGAAGTTCCTTCTCTTTTTCTGGCAGAAGTTCTAACTCTGGTATTAGCTGACAGATATTTGGAGGTTCTTCTGG
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

the second one has a 24 base pair long duplication at position 17: TCCTTCTCTTTTTCTGGCAGAAGT

How can I get the BWA aligner (option bwa+samse) to align the read with the duplication properly to the human reference genome GRCh37/hg19? I fiddled around with the gap open -O and extend -E options but I couldn't seem to make it work. When I set -n to a high number like 100 I get the desired CIGAR-String: 16M24I68M but it seems error prone to me to just allow such a high number of mismatches.

What is the best way to deal with this? thanks

alignment • 722 views

ADD COMMENT • link 5.4 years ago by Mick ▴ 30

0

Entering edit mode

Try bwa mem which is recommended for that read length. It should take care of that. What kind of CIGAR are you expecting?

ADD REPLY • link 5.4 years ago by ATpoint 88k

0

Entering edit mode

Hi ATpoint, thanks for the answer. I tried bwa mem, initially it didn't perform the alignment either, giving me soft and hard clipped alignments. 40S68M as primary alignment and 42M66H as supplementary alignment.

I tried some more options and eventually I found that bwa mem -A 10 -E 1 does the trick and give me 16M24I68M. Not sure if this is the best option but I guess I'll go with it for now.

ADD REPLY • link 5.4 years ago by Mick ▴ 30

0

Entering edit mode

Yes of course it gives clipped alignments, there is something in the read that is not in the reference. Therefore the question => what would be your desired output? These clipped and supplementary alignments are the starting point that structural variant callers use in order to identify variants. You cannot force a structural variant to be perfectly matched to the reference.

ADD REPLY • link 5.4 years ago by ATpoint 88k