Trimmomatic: How to improve FastQC report by adjusting the thresholds in running Trimmomatic?
2
1
Entering edit mode
4.7 years ago
tunl ▴ 70

I recently ran Trimmomatic PE with the following thresholds:

java -jar trimmomatic-0.36.jar PE f1 f2 f1_paired f1_unpaired f2_paired f2_unpaired ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36


I didn’t specify –phred, since I saw a message from Trimmomatic:

Quality encoding detected as phred33

So I think Trimmomatic (v 0.36) just uses –phred33 automatically since it detected phred33 in our fastq.

I got the following survival rates:

Input Read Pairs: 66780154 Both Surviving: 62036296 (92.90%) Forward Only Surviving: 2387288 (3.57%) Reverse Only Surviving: 826849 (1.24%) Dropped: 1529721 (2.29%) TrimmomaticPE: Completed successfully

Now when I ran FastQC on the output forward_paired and reverse_paired fastq files, I got “red-cross” on “Per base sequence content”, “Sequence Duplication Levels”, and “Kmer Content”.

So I am wondering how I should adjust the thresholds in running Trimmomatic in order to improve FastQC reports?

I am considering using LEADING:5 and TRAILING:5 (or LEADING:10 and TRAILING:10, would this be too high for phred33?). I am not sure how much this change on LEADING and TRAILING could improve the quality though. Should I also increase the threshold for SLIDINGWINDOW from 15 to 17 (or even 20)?

Any suggestions and advice would be greatly appreciated.

Thank you very much!

RNA-Seq Trimmomatic FastQC • 3.6k views
1
Entering edit mode
4.7 years ago
GenoMax 99k

See this post I wrote the other day. Do not get bogged down by the red X's in FastQC.

0
Entering edit mode
19 months ago

Trimmomatic detects automatically your imported file encoding PHRED format based on first 10000 reads of its.

trimming process strategy totally related to your pipeline. for example in the RNAseq data analysis, You have to be careful to strike a balance between acceptable quality and also minimize the number of discarded reads. but in the variant calling pipeline you should boost your reads quality as you can. it should be noted, all the adapters contamination should be trim in both of these strategies.

I recommend you 123Fastq which combine FASTQC and trimmomatic in a highly interactive graphical user interface. it also added some improvements to QC modules of FASTQC, added a Kmer-based approach to remove adapters in the trimming, and many other features. try it your own: https://sourceforge.net/projects/project-123ngs/