I also wrote another pair of programs specifically for working with
primer pairs, msa.sh and cutprimers.sh. msa.sh will forcibly align a
primer sequence (or a set of primer sequences) against a set of
reference sequences to find the single best matching location per
reference sequence - in other words, if you have 3 primers and 100 ref
sequences, it will output a sam file with exactly 100 alignments - one
per ref sequence, using the primer sequence that matched best. Of
course you can also just run it with 1 primer sequence.
So you run msa twice - once for the left primer, and once for the
right primer - and generate 2 sam files. Then you feed those into
cutprimers.sh, which will create a new fasta file containing the
sequence between the primers, for each reference sequence. We used
these programs to synthetically cut V4 out of full-length 16S
sequences (PacBio amplicons).
These are both in the BBMap package and tolerant of indels, as required due to PacBio's error profile. If you want to only use exact matches, you might need a different approach.