Question: What is the limiting factor for trimmomatic speed and how can it be increased?
gravatar for Daniel
3.3 years ago by
Cardiff University
Daniel3.8k wrote:

I'm using trimmomatic mainly to filter out adapters in the read through of my paired end illumina data.

My command is as follows, and produces expected results:

java -jar trimmomatic-0.33.jar PE 01_R1.fastq 01_R2.fastq 01_R1-trimpair.fastq 01_R1-trimunpair.fastq 01_R2-trimpair.fastq 01_R2-trimunpair.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:20:7 TRAILING:3 MINLEN:36

However, I can't work out how to direct how many nodes to use (the word node or core doesn't exist in the trimmomatic manual) Edit: Found it under -threads. When run I am shown a message:

Multiple cores found: Using 16 threads

However, I have more available as I am submitting these jobs to a large compute cluster. If I assign 2 cores or 16 or 32, I still get the same message.

Finally, testing on one sample completed in 1000min wall time assigned to it (16 cores) and so I submitted the full 16 samples to the compute queue, but each job failed at ~50% completion as it timed out after 1000min. This makes me wonder if it's being limited by memory constraints, which when running alone it was able to inflate but with the parallel jobs running perhaps competed and slowed down. But that's speculation, I don't know if it works like that. Alternatively, could it be java that's limiting mem and I should push it higher with -Xmx?

Alternatively, I'm not tied to trimmomatic and would use a different illumina adaptor filter if anyone could recommend one.


fastq trimmomatic • 4.5k views
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Daniel3.8k

I wonder if the limiting factor wouldn't be the I/O (reading/writing the fastq files) rather than the CPU (processing the reads) or the memory. I don't know for sure so it would be nice if someone could confirm this.

ADD REPLYlink written 3.3 years ago by Carlo Yague5.0k
gravatar for Brian Bushnell
3.3 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

BBDuk is substantially faster than Trimmomatic (and, in my testing, more accurate for adapter-trimming). With 16 cores, it can adapter-trim over 1 million 150bp paired-end reads per second on 2.5 GHz Intel E5-2670 CPUs, using recommended parameters.

E.G.: in=/dev/shm/r#.fq reads=4m ktrim=r k=23 mink=11 hdist=1 t=16 ref=adapters_a2.fa tbo tpe out=foo.fq

BBDuk version 37.02
Set threads to 16

Memory: max=46902m, free=44944m, used=1958m

Added 7767 kmers; time:         0.225 seconds.
Memory: max=46902m, free=42497m, used=4405m

Input is being processed as paired
Processing time:                3.517 seconds.

Input:                          4000000 reads           604000000 bases.
KTrimmed:                       10626 reads (0.27%)     1176820 bases (0.19%)
Trimmed by overlap:             1658 reads (0.04%)      25632 bases (0.00%)
Total Removed:                  6422 reads (0.16%)      1202452 bases (0.20%)
Result:                         3993578 reads (99.84%)  602797548 bases (99.80%)

Time:                           3.755 seconds.
Reads Processed:       4000k    1065.30k reads/sec
Bases Processed:        604m    160.86m bases/sec
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Brian Bushnell17k

Fantastic, I'll give this a go!

ADD REPLYlink written 3.3 years ago by Daniel3.8k
gravatar for Petr Ponomarenko
3.3 years ago by
United States / Los Angeles /
Petr Ponomarenko2.6k wrote:

So you tried -threads with 2, 16 and 32? Got same results? Trimmomatic analyses each read separately (or each pair with paired end settings) and analyzed reads do not affect trimming or subsequent reads if I remember correctly. Then to avoid I/O problem you can slice your file into chunks of N reads and send it to K nodes in a batch, then wait for everything to be processed and combine results. It does sound like trimmomatic is not super efficient in parallelization on your cluster for some reason.

ADD COMMENTlink written 3.3 years ago by Petr Ponomarenko2.6k

I was going to suggest slicing the input fastq's. However, the slicing & merging makes the whole processing more complicated and error prone and I wonder if it's worth. If time is crucial I would consider also/instead piping the output of the trimmer to the aligner. I put simple howto here Trim & align paired-end reads in a single pass using cutadapt and bwa mem.

ADD REPLYlink written 3.3 years ago by dariober11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 672 users visited in the last hour