Question

Trim fastq after and before motif occurance

0

Entering edit mode

4.6 years ago

christina.galonska ▴ 10

Hi everyone,

Is there any easy way to trim a fasta/fastq before and after a certain motif occurance?

As example, this would be my sequence ATGAAACCTTTGGGGCCCCAGTCAGCTC

My motif of interest would be: GGGGCCCC

I want to trim let's say 5bp 5' and 3bp 3' of the motif occurance which would give you: CCTTTGGGGCCCCAGT

I searched around a bit but could not find any fitting tool. Any ideas/suggestions?

sequencing next-gen • 1.1k views

ADD COMMENT • link 4.6 years ago by christina.galonska ▴ 10

0

Entering edit mode

You can probably adapt this solution in awk: Split a sequence in a fastq file

ADD REPLY • link 4.6 years ago by ATpoint 81k

0

Entering edit mode

Because of the unique requirement here you are likely going to need to write something yourself. Trimming programs are generally setup to trim/discard sequences (to left or right) once a particular k-mer motif is found in the sequence.

Using bbduk.sh from BBMap suite you can filter out reads that contain the motif of interest by doing:

$ bbmap/bbduk.sh literal=NNNNNGGGGCCCCNNNNN k=18 copyundefined in=tt.fq outm=stdout.fq minlen=5

You can then work on that reduced dataset.

ADD REPLY • link 4.6 years ago by GenoMax 141k

score 2 · Accepted Answer · 2019-09-04

2

Entering edit mode

4.6 years ago

christina.galonska ▴ 10

Just tested seqkit amplicon which actually did exactly that (option is only available in the pre-release of version v0.11.0 so far: https://github.com/shenwei356/seqkit/releases/tag/v0.11.0-dev)

Corresponding command would be:

seqkit amplicon input.fastq -F GGGGCCCC -r -5:3 -f -o output.fastq

ADD COMMENT • link 4.6 years ago by christina.galonska ▴ 10