Why did my data get worse after trimming with trimmomatic?
1
0
Entering edit mode
15 months ago
sansan_96 ▴ 80

Hi, I am new to RNA-seq data analysis and am currently trimming my fastq files with trimmomatic, however after trimming my results it seems that some features are getting worse, specifically the sequence length distribution, am I doing something wrong? Could this give me problems in my subsequent analyses?

I am attaching the command line that I am running. I will appreciate any help given.

java -jar trimmomatic-0.39.jar PE -phred33 input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10  SLIDINGWINDOW:4:15 MINLEN:36

Original1 Trimming1

original_dist1 trimming_dist1

trimmomatic • 1.3k views
ADD COMMENT
3
Entering edit mode

I'm confused it clearly got better... You trimmed adapters... from the ends of the reads right?

ADD REPLY
0
Entering edit mode

I remember the times when it was expected the quality of the base to be dropped up to 22 at the end of Illumina reads before preprocessing. What an advance in technologies!

ADD REPLY
0
Entering edit mode

ibq.enriquepola : Please do not delete threads once they have received at least one comment or answer. They provide value to future visitors. You can accept an answer (green check mark) to provide closure to this thread.

ADD REPLY
2
Entering edit mode
15 months ago
GenoMax 141k

Sequence length distribution can change after trimming (especially if you had extraneous sequence in your data). That extraneous data will be gone after trimming. For example, if you originally had an untrimmed read length of 150 bp (where all reads were same length) now it will show a distribution of (150 - longest length of extraneous sequence) bp all the way to 150 bp (reads that did not have any extraneous sequence).

Trimming programs will drop reads once they get below a certain length threshold (both reads in the pair will be dropped to keep files in sync for paired-end data) so that lower boundary number may be default "length of reads to keep" for the program (unless you change it).

ADD COMMENT

Login before adding your answer.

Traffic: 1941 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6