Question: Understanding Trimmomatic Sliding Window Approach
1
gravatar for nikelle.petrillo
2.9 years ago by
Providence College, Providence, RI
nikelle.petrillo100 wrote:

Hello all,

I am performing de novo transcriptome assembly. I have used Trimmomatic to quality filter my reads. I used the argument: SLIDINGWINDOW:4:30 Can someone explain what this means?

My understanding is that the sliding window approach will cut the read when the average quality of each 4-nt window falls below a quality score of 30.

I guess I am getting confused on the "cut the read" part. If someone could clarify this, it would be much appreciated! Also, is the quality score of 30 a phred score?

Thanks for the help! Nikelle

ADD COMMENTlink modified 2.9 years ago by biomaster180 • written 2.9 years ago by nikelle.petrillo100
3

See if the answer here clarifies the concept. Once the condition being checked becomes true (Q score < 30 for window of 4 nt) the remaining nucloetides in the read would be cut.

ADD REPLYlink written 2.9 years ago by genomax65k
1

Thanks Genomax, that link certainly helps. isDo you know if that Q score of 30 is the same as calling it a Phred score of 30?

ADD REPLYlink written 2.9 years ago by nikelle.petrillo100
1

For more on that (for Illumina) see this document.

ADD REPLYlink written 2.9 years ago by genomax65k
2
gravatar for agata88
2.9 years ago by
agata88770
Poland
agata88770 wrote:

If average coverage of quality for 4 bases is lower than 30, then program will cut this 4 bases off. You can define which encoding you have in your files, by adding -phred33 option or -phred64. If your reads are encoded phred+33 (Illumina 1.8 + ) then your nucleotides have quality from 0-41. The cut off for this encoding quality is usually 30.

And FASTQC can help you to identity the encoding of your reads.

Hope it helps,

Best,

Agata

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by agata88770
1

Thanks Agata!

Do you know how i would use fastqc to determine this? Also, do I need to specify how the files were encoded or can trimmomatic figure this out automatically?

Thank you, Nikelle

ADD REPLYlink written 2.9 years ago by nikelle.petrillo100
2

You can download FastQC from here to your computer: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Or use Galaxy for that,

https://usegalaxy.org/

The input is your R1 and R2 files if you have PE reads. After processing in Basic Statistics will be information about encoding of your reads. Beside that you can compare trimming results running FastQC for fastq files after Trimmomatic, then you'll see if all is correct.

Trimmomatic is not doing this automatically as far as I know, Best, Agata

ADD REPLYlink written 2.9 years ago by agata88770
2

Unless you have data that was generated 4+ years ago it is going to be in Sanger fastq (phred+33) format.

ADD REPLYlink written 2.9 years ago by genomax65k
1
gravatar for biomaster
2.9 years ago by
biomaster180
San Jose
biomaster180 wrote:

For QC and read filtering, a tool gives me best experience is AfterQC (https://github.com/OpenGene/AfterQC), do QC and filtering automatically, in a single pass, with pair-end fastq supported.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by biomaster180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1864 users visited in the last hour