Question

Disable Soft-Clipping Bwa Aln

2

Entering edit mode

12.4 years ago

Irsan ★ 7.8k

Is there a way to tell bwa aln to do not soft-clip reads and try to align them?

We see that many soft-clipped reads result in false positive variant calls. In the attached IGV snapshot , top track, you see an example of a (PCR duplicated) read that is mapped with 4 substitutions, 1 deletion and 2 insertions. We think that the resulting variants are not true but caused because the read should not have been mapped there (it happens in all 20 samples!).

When we go into the bam-file we see that the read was soft-clipped dramatically (CIGAR 1S18M1D3M1I1M2I11M114S, meaning 115 out of 151 bases were soft-clipped). When I remove all reads from the bam-file that have a CIGAR-string including S (soft clipping) than all the wrongly mapped reads disappear (see IGV snapshot, bottom track).

We believe, at least for our specific study design/questions, we do not want bwa to soft clip reads and try to align them. Is there a way to do so? Or is there a way to remove soft-clipped reads from a bam-file (that includes updating other information like sam-flags up the paired-read to the right numbers?)

bwa • 12k views

ADD COMMENT • link updated 12.4 years ago by toni ★ 2.2k • written 12.4 years ago by Irsan ★ 7.8k

score 3 · Answer 1 · 2013-02-01

3

Entering edit mode

12.4 years ago

toni ★ 2.2k

Yes, you have a -s option that allows you to tell BWA that it must not try to realign with Smith-Waterman.

This option is to be supplied in bwa sampe, not in bwa aln.

ADD COMMENT • link 12.4 years ago by toni ★ 2.2k

0

Entering edit mode

Exactly what I was looking for, thanks.

ADD REPLY • link 12.4 years ago by Irsan ★ 7.8k

1

Entering edit mode

Disabling soft clipping is fine for genomic alignment when you have more than required coverage. But normally you should not disable it. One thing you can do is write a script that filters reads with a lower ratio of number of bases aligned to the total length of the read. You can set it to 0.75 ans see if it works for you. Also, I dont think the soft clipped region is used for variant calling at all. I assume whatever you are doing is for the visualization purpose.

ADD REPLY • link 12.4 years ago by Ashutosh Pandey 12k