Trim fastq after and before motif occurance
1
0
Entering edit mode
4.6 years ago

Hi everyone,

Is there any easy way to trim a fasta/fastq before and after a certain motif occurance?

As example, this would be my sequence ATGAAACCTTTGGGGCCCCAGTCAGCTC

My motif of interest would be: GGGGCCCC

I want to trim let's say 5bp 5' and 3bp 3' of the motif occurance which would give you: CCTTTGGGGCCCCAGT

I searched around a bit but could not find any fitting tool. Any ideas/suggestions?

sequencing next-gen • 1.1k views
ADD COMMENT
0
Entering edit mode

You can probably adapt this solution in awk: Split a sequence in a fastq file

ADD REPLY
0
Entering edit mode

Because of the unique requirement here you are likely going to need to write something yourself. Trimming programs are generally setup to trim/discard sequences (to left or right) once a particular k-mer motif is found in the sequence.

Using bbduk.sh from BBMap suite you can filter out reads that contain the motif of interest by doing:

$ bbmap/bbduk.sh literal=NNNNNGGGGCCCCNNNNN k=18 copyundefined in=tt.fq outm=stdout.fq minlen=5

You can then work on that reduced dataset.

ADD REPLY
2
Entering edit mode
4.6 years ago

Just tested seqkit amplicon which actually did exactly that (option is only available in the pre-release of version v0.11.0 so far: https://github.com/shenwei356/seqkit/releases/tag/v0.11.0-dev)

Corresponding command would be:

seqkit amplicon input.fastq -F GGGGCCCC -r -5:3 -f -o output.fastq

ADD COMMENT

Login before adding your answer.

Traffic: 2695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6