Hi everyone,
Is there any easy way to trim a fasta/fastq before and after a certain motif occurance?
As example, this would be my sequence ATGAAACCTTTGGGGCCCCAGTCAGCTC
My motif of interest would be: GGGGCCCC
I want to trim let's say 5bp 5' and 3bp 3' of the motif occurance which would give you: CCTTTGGGGCCCCAGT
I searched around a bit but could not find any fitting tool. Any ideas/suggestions?
You can probably adapt this solution in
awk
: Split a sequence in a fastq fileBecause of the unique requirement here you are likely going to need to write something yourself. Trimming programs are generally setup to trim/discard sequences (to left or right) once a particular k-mer motif is found in the sequence.
Using
bbduk.sh
from BBMap suite you can filter out reads that contain the motif of interest by doing:You can then work on that reduced dataset.