Trimming SSR (repetitive primer) sequences
Entering edit mode
10 months ago
pixie@bioinfo ★ 1.5k

Hello, my lab is trying to work on SSR sequencing where we have designed specific SSR primers and we are trying to capture the regions between consecutive SSR primers. Until now, I was using exact match with "seqkit locate" option to exactly match the primer+anchor sequences. I have not yet done any QC on the demultiplexed data. So this is on the rawest sequence.

zcat 221027_MN01111_0087_A000H535FM.XXXX.R1.fastq.gz | seqkit locate -f pattern.fa >221027_MN01111_0087_A000H535FM_XXX_R1_locate.txt

However, I noticed that we pick up partial repeat primer sequences or even partial primer+ complete primer sequences at the beginning of the read (like a primer dimer). An example:

R1 of the fastq file

Here what I thought as the ISSR (region between two repeats) is actually another partial repeat primer from my list. How can I make the search pattern more flexible ? Any tools I could try ? Thanks

genomics • 329 views
Entering edit mode

Please post some actual text format data. would be another tool to try.


Login before adding your answer.

Traffic: 2068 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6