Entering edit mode
7.8 years ago
Paul
★
1.5k
Dear all,
according to nice article from Multiplicom HERE - page 27 down. What is the procedure how to masking primer sequencing from my amplicons sequencing? Should I mask it? Should I trim it? Can anyone to share experiences and some workflows (tools)?
Thank anyone to sharing your experiences!
In the example of the picture you gave, trimming seems to be particularly important, because primer sequence overlaps genomic sequence due to multiple partially-overlapping primer sets being multiplexed together. Is that common, and is it the case in your experiment? Still, while trimming seems to be useful in the diagram, it's not exactly a panacea, when you consider the case of linked SNPs, one in a primer and one not. The point of trimming the primer is to reduce bias and non-genomic sequence, but bear in mind that it won't eliminate bias, even though the picture makes it seem like it would.
As for masking vs trimming, that depends on how your downstream analysis handles masked bases. Trimming is usually better.
Thank you for comment, Would you recommend any worflow and tools how to right trim / mask those sequences? We have 80 views and nobody have any other experiences or additional information?
BBMap suite written by Brian Bushnell has the tools you need. Look at
bbduk.sh
for trimming andbbmask.sh
for masking. There are other tools that can align/pileup etc.Does it make a sense to trim each amplicom -30 bp separately for forward and reverse orientation?
If the primers are 30bp long, it makes sense to trim 30bp from the start of all reads. With BBDuk the command would be (assuming you have paired-end reads like in the diagram):
The directions in the document you attached seem to be for trimming. Since you are following an established protocol why not stick with the recommendations?
Because I do not use any of the commercial softwares like JSI or SeqNext. I would like to also avoid MSR. I think lot of people doing amplicon sequencing and I would like to know their attitude to this problem. Maybe someone can share its workflow.
For nested amplicon sequencing, we first align the original reads (having primer sequences) to reference and then mask the primers by soft-clipping the alignments using BAMClipper (Scientific Reports 7:1567). As mentioned in another thread, primer trimming at FASTQ level (1) is computationally expensive, (2) incorrectly handles nested PCR amplicons, (3) makes indels harder to detect by conventional variant calling.
No, it's not. It's one of the computationally cheapest operations you can do for read preprocessing. I would further review your other claims, but for example, "(3) makes indels harder to detect by conventional variant calling." (referring to adapter-trimming) is ridiculous. Trimming adapters properly makes indels easier to detect using conventional variant-callers.
So you prefer to hard trim FASTQ files from primer sequences instead of soft clipping in BAM?
Having data (free of any extraneous sequence) makes it simple to do (e.g. a de novo assembly run) different types of analyses.
We exactly did such FASTQ hard trimming for variant calling purpose but almost missed a germline BRCA1 17-nt deletion in a hereditary breast cancer patient.
It's possible that your methodology was flawed. Can you describe what kind of library you were using, the preprocessing, mapping, and variant-calling steps?
Thank you for your comments. We described the case in details at Scientific Reports 7:1567.
Thank you for good article.
Having read this thread, I still can't choose between using cutadapt and BAMclipper for amplicon sequencing...