Entering edit mode
4.1 years ago
Mick
▴
30
Hi guys,
I have the following two reads:
@read1
TTTTCCTTCTCTTTCTCCTTCTCTTTTTCTGGCAGAAGTTCTAACTCTGGTATTAGCTGACAGATATTTGGAGGTTCTTCTGG
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@read1-mut
TTTTTCCTTCTCTTTCTCCTTCTCTTTTTCTGGCAGAAGTTCCTTCTCTTTTTCTGGCAGAAGTTCTAACTCTGGTATTAGCTGACAGATATTTGGAGGTTCTTCTGG
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
the second one has a 24 base pair long duplication at position 17: TCCTTCTCTTTTTCTGGCAGAAGT
How can I get the BWA aligner (option bwa+samse) to align the read with the duplication properly to the human reference genome GRCh37/hg19? I fiddled around with the gap open -O and extend -E options but I couldn't seem to make it work. When I set -n to a high number like 100 I get the desired CIGAR-String: 16M24I68M but it seems error prone to me to just allow such a high number of mismatches.
What is the best way to deal with this? thanks
Try
bwa mem
which is recommended for that read length. It should take care of that. What kind of CIGAR are you expecting?Hi ATpoint, thanks for the answer. I tried bwa mem, initially it didn't perform the alignment either, giving me soft and hard clipped alignments. 40S68M as primary alignment and 42M66H as supplementary alignment.
I tried some more options and eventually I found that bwa mem -A 10 -E 1 does the trick and give me 16M24I68M. Not sure if this is the best option but I guess I'll go with it for now.
Yes of course it gives clipped alignments, there is something in the read that is not in the reference. Therefore the question => what would be your desired output? These clipped and supplementary alignments are the starting point that structural variant callers use in order to identify variants. You cannot force a structural variant to be perfectly matched to the reference.