Question: Soft-clipping of reads in Amplicon-sequenced data
gravatar for jsneaththompson
3.3 years ago by
jsneaththompson90 wrote:

I have a variant calling pipeline which I use to process amplicon-sequenced fastq files; it uses cutadapt to remove the adapter sequences on either the 5' or 3' end, then performs alignment with bwa mem.

The user guide for cutadapt states that

And if you use BWA-MEM, the trailing (5’) bases of a read that do not match the reference are soft-clipped, which covers those cases in which an adapter does occur.

And the bam files produced by bwa do show examples of soft-clipped trailing bases. I don't expect this to be an issue for the later stages of variant calling as the trailing bases are soft-clipped and should be disregarded by the variant calling software, but I'm a bit confused by the existence of the soft-clipped regions in the first place. Surely if the data is amplicon-sequenced, then all reads should have adapters, so I wouldn't expect any trailing bases that don't match the reference? Does this mean the adapter sequences I pass to cutadapt are incorrect? Or is this a non-issue?

Here's a link to an example bam track, the top track shows the soft-clipped reads.

ADD COMMENTlink modified 3.3 years ago by christacaggiano50 • written 3.3 years ago by jsneaththompson90
gravatar for Matt Shirley
3.3 years ago by
Matt Shirley9.4k
Cambridge, MA
Matt Shirley9.4k wrote:

It depends on how you construct your sequencing library. If there are different amplicons of varying size, and if your sequencing reads are longer than the amplicon length then you'll sequence into the adapter at the 5' or 3' ends. It sounds like you might be confusing Illumina adapter sequences with the primers you used to amplify your amplicon.

ADD COMMENTlink written 3.3 years ago by Matt Shirley9.4k

You're right, I was getting confused. Cutadapt is removing primer sequences, and according to others in the lab we have amplicons of varying size. Thanks for the help.

ADD REPLYlink written 3.3 years ago by jsneaththompson90
gravatar for WouterDeCoster
3.3 years ago by
WouterDeCoster44k wrote:

Your primers exactly match the reference so won't give you any softclipped bases. BUT I do think that you should separately mask your primer sequences because those shouldn't be used for variant calling.

ADD COMMENTlink written 3.3 years ago by WouterDeCoster44k
gravatar for christacaggiano
3.3 years ago by
christacaggiano50 wrote:

Also keep in mind that soft clipping can happen when regions with insertions and deletions occur and your software is unable to map them to the genome properly. If you're planning on calling indels later, especially with software that isn't clipping-aware this could affect how well you call them.

See Scapel (2015)

and varDict (2016)

ADD COMMENTlink written 3.3 years ago by christacaggiano50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2143 users visited in the last hour