10 months ago by
Do the QC first in order to check if you really need quality trim and to remove adapters. Use e.g. fastQC and MultiQC for this job. If there is no considerable adapter contamination and all reads have good to very good quality, you might not need to do trimming at all. I would say: trimming is mostly not required anymore. For the purpose of assembly you might do it anyway just to be conservative. trimmomatic will give you the number or proportion of reads trimmed.
For more details on the trimming parameters, please read the fine manual: http://www.usadellab.org/cms/index.php?page=trimmomatic
In short, I assume you have paired-end and want to assemble a genome, then try something like the recommendation here:
java -jar trimmomatic-0.35.jar PE -phred33 input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:15 MINLEN:36
Ask your sequencing center for the TruSeq protocol version and eventually non-standard adapters.
LEADING:3 TRAILING:3 trims leading and trailing sequences if under a threshold. This is likely not needed.
SLIDINGWINDOW:4:15 sets a minimum average threshold of 15 for a window of size 4.
I guess that you could set the threshold also to 30 and still keep 90% of your reads. The threshold is the same value as given by FastQC on the y-axis of the Read quality plot. I think 25 marks the lower limit of the green zone in FastQC.
I am personally not totally convinced that Illumina quality scores have a good bearing in reality though.
From the manual page:
Remove adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10)
Remove leading low quality or N bases (below quality 3) (LEADING:3)
Remove trailing low quality or N bases (below quality 3) (TRAILING:3)
Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 (SLIDINGWINDOW:4:15)
Drop reads below the 36 bases long (MINLEN:36)