I'm trying to get variants from amplicon-based sequencing reads. These reads have: primer adapters and barcodes on both ends. I'm looking into the GATK pipeline and Samtools/VarScan pipelines.
I was able to remove the primer sequences on both sides using cutadapt.
Next, I aligned my reads using BWA-mem. Then, I removed duplicate reads (to remove PCR duplicates) using SamTools' markdup. However, aligning removed the barcodes on both ends and deduplicating removed most of my reads. I'm looking into Picard's MarkDuplicates, but that also does not seem to be applicable to amplicon-based reads because it's based on the start position of the reads and would delete a majority of my reads.
Is there any way to remove identical sequences for amplicon-based reads? Furthermore, I want the barcode identifiers to remain after aligning. How would I do that?