Why does Phred+ encoding changes after trimming fastq files??
Entering edit mode
7 weeks ago
Bikal • 0

When I did fastqc analysis on my original fastq files, the encoding that I can see in the biostatistics was sanger/illumina 1.9. But when I did the trimmomatic on those fastq files to trim low quality bases followed by fastqc analysis on the trimmed fastq files the encoding in the biostastics was Illumina 1.5. Why is it so? I made sure that the ASCII characters are not messed up with trimmomatic parameters. However, I am surprised to see the change in the encoding. My only concern is will it affect any subsequent downstream analysis??

fastqc fastq phred quality trimming • 253 views
Entering edit mode

Can you show us the trimmomatic command you ran?

Entering edit mode

I have used the following command for trimmomatic. Note: I am analyzing forward and reverse fastq files separately as they are result of amplification of a gene from several strains of the same mycoparasite collected across different regions. Each of my fastq files has only one read. I am using CROP tool to crop certain regions by trimming off bad quality region from all forward fastq files after looking at the multiqc of the original fastqc files. And similar for reverse fastq files too with only change in CROP parameter. To my surprise fastqc on the trimmed region gave me Illumina 1.5 for some of the samples (not all samples but some of the samples, 30-40% of samples, as in the picture below).

import os
import subprocess

# Define the base directories
forward_dir = "Oli_final_trim_phredscore/paired_F"
base_output_dir = "Oli_phred2"
paired_output_dir_F = os.path.join(base_output_dir, "paired_F")

# Ensure the directories exist
os.makedirs(paired_output_dir_F, exist_ok=True)

# Trimmomatic parameters

minlen = "50"  
trimmomatic_jar_path = "/fp/homes01/myusername/miniforge3/envs/gene_analysis/share/trimmomatic-0.39-2/trimmomatic.jar"

# Iterate over forward read files
for forward_file in os.listdir(forward_dir):
    if forward_file.endswith("_Oli_F.fastq"):
        base_name = forward_file.replace("_Oli_F.fastq", "")

        forward_input_path = os.path.join(forward_dir, forward_file)

        # Define output paths
        forward_paired_output = os.path.join(paired_output_dir_F, f"{base_name}_Oli_F.fastq")

        # Construct the Trimmomatic command for SE mode
        trim_command = [
            "java", "-jar", trimmomatic_jar_path, "SE", "-phred33",
            f"MINLEN:{minlen}", "HEADCROP:22", "CROP:250"

        # Execute the command
        print(f"Trimming forward file: {forward_file}")

print("All forward file trimming operations are complete.")

Pic one represents fastqc of a sample before trimmomatic. Second picture represents fastqc of the same sample after trimmomatic.

fastqc of a sample before trimmomatic fastqc of same sample after trimmomatic


Login before adding your answer.

Traffic: 1731 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6