I have paired end illumina miseq 16S amplicon reads. There are forward and reverse reads in separate files,
_2.fastq. Barcodes and adapters have already been removed. Original authors multiplexed 16S reads with other primers, so my file contains 16S sequences, as well as a bunch of non-target sequences. I would like to:
- Subset to 16S sequences only.
- Remove 16S primers.
The SRR number for an example sample is SRR7204977, which can be found here.
I usually use bbduk to handle this type of problem, however I am getting a strange outcome. Code below:
/path/to/bbmap/bbduk.sh literal=GGGTTNCGNTCGTTG ktrim=l k=10 in1=fastq1.path in2=fastq2.path out1=output1.path out2=output2.path ordered=t
This keeps returning the error:
****** WARNING! A KMER OPERATION WAS CHOSEN BUT NO KMERS WERE LOADED. ****** ****** PLEASE ENSURE K IS LESS THAN OR EQUAL TO REF SEQUENCE LENGTHS. ****** Exception in thread "main" java.lang.AssertionError: You can bypass this assertion with the -da flag. at jgi.BBDukF.process2(BBDukF.java:1007) at jgi.BBDukF.process(BBDukF.java:924) at jgi.BBDukF.main(BBDukF.java:70)
I have verified that many sequences in fastq1.path do indeed have this primer sequence. The smallest read length in the sequence file is 35bp, largest 301.
Any suggestions on how to (1) filter to only reads with this primer present and (2) remove the primer would be greatly appreciated!