I just had to analyze a really bad lane. It is an old Illumina GAIIx lane with reads of 152 cycles. The last 30-40 cycles are of really really bad quality on average so I decided to use the
-q 20 switch in
bwa aln to trim reads 3'ends based on quality prior to mapping (something I usually would not do).
To have a look at what this trimming parameter left in the BAM file I drew the distribution of the length of soft-clipped part in the reads. To do so, I took the CIGAR string for 20 million alignments that were declared unique by BWA (
XT:A:U tag). Here follows what I got :
So we can see that there is a periodic pattern after the main pic (representing no soft-clipped bases). We can also notice the slightly higher bar at position 117. 152-117=35, indeed by default BWA won't trim the reads to something less than 35bp.
Have you already noticed such a pattern using
bwa aln -q and what in the algorithm produces this ?
Because looking at the
-q definition in the doc, I can not see any reason why the trimming would have such a periodic pattern.