I have a variant calling pipeline which I use to process amplicon-sequenced fastq files; it uses cutadapt to remove the adapter sequences on either the 5' or 3' end, then performs alignment with bwa mem.
The user guide for cutadapt states that
And if you use BWA-MEM, the trailing (5’) bases of a read that do not match the reference are soft-clipped, which covers those cases in which an adapter does occur.
And the bam files produced by bwa do show examples of soft-clipped trailing bases. I don't expect this to be an issue for the later stages of variant calling as the trailing bases are soft-clipped and should be disregarded by the variant calling software, but I'm a bit confused by the existence of the soft-clipped regions in the first place. Surely if the data is amplicon-sequenced, then all reads should have adapters, so I wouldn't expect any trailing bases that don't match the reference? Does this mean the adapter sequences I pass to cutadapt are incorrect? Or is this a non-issue?
Here's a link to an example bam track, the top track shows the soft-clipped reads.