Trimming SSR (repetitive primer) sequences
10 months ago
pixie@bioinfo

Hello, my lab is trying to work on SSR sequencing where we have designed specific SSR primers and we are trying to capture the regions between consecutive SSR primers. Until now, I was using exact match with "seqkit locate" option to exactly match the primer+anchor sequences. I have not yet done any QC on the demultiplexed data. So this is on the rawest sequence.

zcat 221027_MN01111_0087_A000H535FM.XXXX.R1.fastq.gz | seqkit locate -f pattern.fa >221027_MN01111_0087_A000H535FM_XXX_R1_locate.txt

However, I noticed that we pick up partial repeat primer sequences or even partial primer+ complete primer sequences at the beginning of the read (like a primer dimer). An example:

R1 of the fastq file

Here what I thought as the ISSR (region between two repeats) is actually another partial repeat primer from my list. How can I make the search pattern more flexible ? Any tools I could try ? Thanks

Please post some actual text format data. would be another tool to try.


