I am using bwasw to align reads from a MiSeq run to a very short reference sequence. For some reason, bwa is soft-clipping (assigning S in the cigar string) the leading and trailing bases of many or most of my reads. (All of my reads seem to get the first base soft-clipped, after that there is a distribution of soft clipping that drops off toward the center of the read).
This is occurring despite the fact that the bases which are getting soft clipped (at least the ones I checked by scanning the sam) are EXACT MATCHES to the reference. They are not consistently surrounded by mismatches (they occur even in reads with an edit distance of 0), nor are they low-quality (prior to assembly, I ran a very stringent quality filter, so every base in every one of my reads is phred 36 or higher).
Possibly pertinent: all of my reads are the same length as my reference +/- 2 nts. (This is a ribozyme population deep sequencing project).
bwa bwasw -M -NM myreference.fasta myfiltered.fastq > myoutput.sam
Alternatively, is there a way to tell samtools to ignore soft-clipping? Or a tool to convert the cigar strings to M/I/D only? The soft-clipped bases are artificially reducing the coverage in my mpileup, and I am worried how they will affect other downstream modules I want to use.
Thanks for any insight!
cross posted at SeqAnswers