Trimming fastq up until a sequence
2
0
Entering edit mode
2.4 years ago
SemiQuant ▴ 70

Hi

I have a somewhat difficult problem to find a solution to on google. I need to trim my fastq files up until a sequence, and not remove that sequence (but remove everything before it).

This
someRandomNoise_aKnownSequence_unknownSequence
becomes
aKnownSequence_unknownSequence


All the tools I use, and that I have seen, would remove both the "someRandomNoise" and the "aKnownSequence"

I could try to find the location of the sequence in each read and then trim then in a loop, but this seem very inefficient.

trimming fastq sequencing • 1.2k views
0
Entering edit mode

To verify your trimming results, you might like to clone our visualisation tool Trimviz, see example report here (currently in beta testing). I apologize for the shameless plug, but it's exactly this kind of non-standard trimming situation for which I envisaged it would be useful. Dependencies include a few common R and python libs, plus samtools (and ideally seqtk). In FQ mode, give it the pre-trimmed and post-trimmed fastq file names ( python path/to/trimviz.py FQ -u <untrimmed.fq.gz> -t <trimmed.fq.gz> -o <outdir> , and use -k 50000 if you don't have seqtk installed or are in a hurry). I imagine you would see a big block of vertical stripes around the 5' trimming site in the sequence heat-maps, corresponding to the desired target sequence. If it is on the RIGHT of the 5'-trimming site, then that sequence has been successfully retained in your reads but everything before it is trimmed.

1
Entering edit mode
2.4 years ago
GenoMax 117k

You can use bbduk.sh from BBMap suite with following structure.

bbduk.sh in=input.fq.gz out=output.fq.gz literal=aKnownSequence ktrim=l


A detailed guide is available here.

0
Entering edit mode

I can't believe I missed that in the guide (its the first paragraph!) Thanks.

0
Entering edit mode
2.4 years ago
SemiQuant ▴ 70

I've fount that filtlong has a trim option that also does this.