Question: Trim fastq after and before motif occurance
0
gravatar for christina.galonska
19 days ago by
christina.galonska10 wrote:

Hi everyone,

Is there any easy way to trim a fasta/fastq before and after a certain motif occurance?

As example, this would be my sequence ATGAAACCTTTGGGGCCCCAGTCAGCTC

My motif of interest would be: GGGGCCCC

I want to trim let's say 5bp 5' and 3bp 3' of the motif occurance which would give you: CCTTTGGGGCCCCAGT

I searched around a bit but could not find any fitting tool. Any ideas/suggestions?

sequencing next-gen • 77 views
ADD COMMENTlink modified 19 days ago • written 19 days ago by christina.galonska10

You can probably adapt this solution in awk: Split a sequence in a fastq file

ADD REPLYlink written 19 days ago by ATpoint23k

Because of the unique requirement here you are likely going to need to write something yourself. Trimming programs are generally setup to trim/discard sequences (to left or right) once a particular k-mer motif is found in the sequence.

Using bbduk.sh from BBMap suite you can filter out reads that contain the motif of interest by doing:

$ bbmap/bbduk.sh literal=NNNNNGGGGCCCCNNNNN k=18 copyundefined in=tt.fq outm=stdout.fq minlen=5

You can then work on that reduced dataset.

ADD REPLYlink modified 19 days ago • written 19 days ago by genomax71k
2
gravatar for christina.galonska
19 days ago by
christina.galonska10 wrote:

Just tested seqkit amplicon which actually did exactly that (option is only available in the pre-release of version v0.11.0 so far: https://github.com/shenwei356/seqkit/releases/tag/v0.11.0-dev)

Corresponding command would be:

seqkit amplicon input.fastq -F GGGGCCCC -r -5:3 -f -o output.fastq

ADD COMMENTlink written 19 days ago by christina.galonska10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1913 users visited in the last hour