How does sliding window work in Trimmomatic
3
1
Entering edit mode
7.5 years ago
shl198 ▴ 420

I was just wondering how does sliding window in trimmomatic work?

The definition is scanning from the 5' end of the read, and removes the 3' end of the read when the average quality of a group of bases drops below a specified threshold. For example, if we have a sequence ATCGATCGATCG and we set SLIDINGWINDOW: 4:15.

It begins with the first 4 in a window, ATCG, but if the score is below 15, which base it will trim? Is that the last base in this window? What is the next start position of the window? 2 or 5? thanks.

Trimmomatic illumina • 13k views
ADD COMMENT
7
Entering edit mode
7.5 years ago
cts ★ 1.6k

The SLIDINGWINDOW trimmer will cut the leftmost position in the window where the average quality drops below the threshold and remove the rest of the read. However if there is low quality in the very beginning of the read then it will fail the minimum length tests and be removed completely - the remaining 3-prime end (even if it is good quality will not be printed)

Consider the following test file t.fq:

@1\1
AATGATCGTAGCGATGCAAGCTAGCCCGATGCCCGATCGCATCG
+
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeEFCB
@2\1
AATGATCGTAGCGATGCAAGCTAGCCCGATGCCCGATCGCATCG
+
EFCBeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

and processing with:

$ java -jar trimmomatic.jar SE -phred64 t.fq tt.fq SLIDINGWINDOW:4:15

TrimmomaticSE: Started with arguments: -phred64 t.fq tt.fq SLIDINGWINDOW:4:15
Automatically using 16 threads
Input Reads: 2 Surviving: 1 (50.00%) Dropped: 1 (50.00%)
TrimmomaticSE: Completed successfully

The output file looks like the following:

@1\1
AATGATCGTAGCGATGCAAGCTAGCCCGATGCCCGATCGC
+
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

As you can see the read with the poor quality at the beginning has been removed completely

ADD COMMENT
1
Entering edit mode

Is there any way to trim the read and keep the 3-prime end if bad quality bases are at the beginning of reads.

ADD REPLY
1
Entering edit mode
4.7 years ago

Sliding-window averages yield non-optimal trimming. There are only two programs that I am aware of that currently implement optimal trimming, seqtk and BBDuk. Both use the Phred algorithm, which is provably optimal. I assume the original implementation exists somewhere, presumably here, but I have never tried it.

Anyway - if you want optimal quality-trimming, you should use BBDuk or seqtk. For example:

bbduk.sh in=reads.fq out=trimmed.fq qtrim=rl trimq=10

That will optimally trim reads such that leading or trailing portions with average quality below 10 will be removed, and therefore, will solve @Goutham Atla's problem. Actually, BBDuk supports window-average trimming as well, and it's much faster than Trimmomatic in doing so, but I don't recommend it, because it's provably inferior to optimal trimming; so, I won't disclose the flags, as it promotes bad science.

There are a lot of quality-trimming programs out there today. I have tested every one I could find, and they are all (apart from seqtk and BBDuk) empirically as well as theoretically inferior to optimal trimming.

ADD COMMENT
0
Entering edit mode
7.5 years ago

Well what I believe that it does is that once the quality in the window drops under 15 it basically stops, "cuts" the sequence there and discards the rest. I'd say that it ought to cut at the leftmost coordinate of the window, after all the entire window is bad. But there is no next window to slide over.

ADD COMMENT

Login before adding your answer.

Traffic: 1806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6