Question: Trimmomatic: How to improve FastQC report by adjusting the thresholds in running Trimmomatic?
1
gravatar for tunl
3.5 years ago by
tunl60
tunl60 wrote:

I recently ran Trimmomatic PE with the following thresholds:

java -jar trimmomatic-0.36.jar PE f1 f2 f1_paired f1_unpaired f2_paired f2_unpaired ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

I didn’t specify –phred, since I saw a message from Trimmomatic:

Quality encoding detected as phred33

So I think Trimmomatic (v 0.36) just uses –phred33 automatically since it detected phred33 in our fastq.

I got the following survival rates:

Input Read Pairs: 66780154 Both Surviving: 62036296 (92.90%) Forward Only Surviving: 2387288 (3.57%) Reverse Only Surviving: 826849 (1.24%) Dropped: 1529721 (2.29%) TrimmomaticPE: Completed successfully

Now when I ran FastQC on the output forward_paired and reverse_paired fastq files, I got “red-cross” on “Per base sequence content”, “Sequence Duplication Levels”, and “Kmer Content”.

So I am wondering how I should adjust the thresholds in running Trimmomatic in order to improve FastQC reports?

I am considering using LEADING:5 and TRAILING:5 (or LEADING:10 and TRAILING:10, would this be too high for phred33?). I am not sure how much this change on LEADING and TRAILING could improve the quality though. Should I also increase the threshold for SLIDINGWINDOW from 15 to 17 (or even 20)?

Any suggestions and advice would be greatly appreciated.

Thank you very much!

fastqc rna-seq trimmomatic • 2.7k views
ADD COMMENTlink modified 4 months ago by genetician201610 • written 3.5 years ago by tunl60
1
gravatar for genomax
3.5 years ago by
genomax76k
United States
genomax76k wrote:

See this post I wrote the other day. Do not get bogged down by the red X's in FastQC.

ADD COMMENTlink written 3.5 years ago by genomax76k
0
gravatar for genetician2016
4 months ago by
genetician201610 wrote:

Trimmomatic detects automatically your imported file encoding PHRED format based on first 10000 reads of its.

trimming process strategy totally related to your pipeline. for example in the RNAseq data analysis, You have to be careful to strike a balance between acceptable quality and also minimize the number of discarded reads. but in the variant calling pipeline you should boost your reads quality as you can. it should be noted, all the adapters contamination should be trim in both of these strategies.

I recommend you 123Fastq which combine FASTQC and trimmomatic in a highly interactive graphical user interface. it also added some improvements to QC modules of FASTQC, added a Kmer-based approach to remove adapters in the trimming, and many other features. try it your own: https://sourceforge.net/projects/project-123ngs/

ADD COMMENTlink written 4 months ago by genetician201610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 724 users visited in the last hour