I'm using BBDuk to trim adapters as well as primers with a bunch of degenerate bases off of amplicon reads and I want to confirm that BBDuk works with all the different IUPAC degenerate base codes ("R", "Y", "S", "W", "K", "M", "B", "D", "H", "V") and not just "N". The BBDuk documentation only shows an example using "N", so I wanted to check on this so I'm sure I get all the primers trimmed off properly. I am using settings of k=21, mink=12, hdist=1, and I'm using the included adapters reference file as well as a custom fastq reference file containing my set of tagged forward and reverse primers and the untagged primers. Any input from the community is much appreciated. Thanks.
From BBDuk guide page.
Matching degenerate sequences such as primers:
bbduk.sh in=reads.fq out=matching.fq literal=ACGTTNNNNNGTC copyundefined k=13 mm=f
This will clone the reference sequences to represent every possibility for the degenerate bases (Ns and other non-ACGT IUPAC symbols). For example, this would create ACGTTAAAAAGTC, ACGTTAAAACGTC, ACGTTAAAAGGTC, and so forth (all 1024 possibilities). If you are interested in seaching for new life by mining shotgun metagenomic reads for 16s sequences that do not quite match your primers… this (and hdist) might be a good place to start! But it’s also useful for adapters with barcodes.