Question: Why Is Bwa Soft-Clipping Exactly Matching Reads?
3
gravatar for ktpetrie
5.4 years ago by
ktpetrie30
ktpetrie30 wrote:

I am using bwasw to align reads from a MiSeq run to a very short reference sequence. For some reason, bwa is soft-clipping (assigning S in the cigar string) the leading and trailing bases of many or most of my reads. (All of my reads seem to get the first base soft-clipped, after that there is a distribution of soft clipping that drops off toward the center of the read).

This is occurring despite the fact that the bases which are getting soft clipped (at least the ones I checked by scanning the sam) are EXACT MATCHES to the reference. They are not consistently surrounded by mismatches (they occur even in reads with an edit distance of 0), nor are they low-quality (prior to assembly, I ran a very stringent quality filter, so every base in every one of my reads is phred 36 or higher).

Possibly pertinent: all of my reads are the same length as my reference +/- 2 nts. (This is a ribozyme population deep sequencing project).

Does anyone know why bwa might be acting this way or how to get around it? Here's my bwa command:

bwa bwasw -M -NM myreference.fasta myfiltered.fastq > myoutput.sam

Alternatively, is there a way to tell samtools to ignore soft-clipping? Or a tool to convert the cigar strings to M/I/D only? The soft-clipped bases are artificially reducing the coverage in my mpileup, and I am worried how they will affect other downstream modules I want to use.

Thanks for any insight!

cross posted at SeqAnswers

bwa • 3.9k views
ADD COMMENTlink modified 6 months ago by RamRS20k • written 5.4 years ago by ktpetrie30
1

It is an interesting problem, perhaps has to do with the unusual circumstance of reads being as long as the reference. Obviously If the read is longer than the reference then soft clipping is ok.

Can you edit your post and add your reference and a few SAM alignments that are soft clipped?

ADD REPLYlink written 5.4 years ago by Istvan Albert ♦♦ 79k

I wonder what the -M -NM options are for? I couldn't find them in the bwasw manual segment for my (possibly outdated) BWA 0.7.5... Are they a recent addition, or something undocumented?

ADD REPLYlink written 5.3 years ago by Chronos550

Might not be your case, but at least novoalign has an upper limit to the aligned read length. Last time I checked it was ~300 bp, and I think beyond that length it was soft-clipping the reads. I'm not entirely sure, though.

ADD REPLYlink written 5.3 years ago by Chronos550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2386 users visited in the last hour