Question

How to paralle trim_galore for single sample?

0

Entering edit mode

2.8 years ago

optimistsso4co3 ▴ 110

It takes about 15 hours for my paired end human genome to be trimmed. Although there are ways to parallel trim_galore for multiple sample, could you suggest method for single sample?

trim_galore • 1.2k views

ADD COMMENT • link updated 2.8 years ago by ATpoint 81k • written 2.8 years ago by optimistsso4co3 ▴ 110

2

Entering edit mode

TrimGalore! is a wrapper around cutadapt. You could therefore you cutadapt directly, which has a multithreading option, and in case pigz is in PATH it will use this for (de)compression rather than default gzip. That all will speed-up things. pigz is a multithreaded version of gzip.

ADD REPLY • link 2.8 years ago by ATpoint 81k

0

Entering edit mode

I move my answer to comment as this is not possible within a given nextflow pipeline without modifications.

ADD REPLY • link 2.8 years ago by ATpoint 81k

0

Entering edit mode

You could split your input file into multiple pieces and trim those in parallel.

Or you can use a multi-threaded trimming program like bbduk.sh to speed the process up. A guide for bbduk is available.

ADD REPLY • link 2.8 years ago by GenoMax 141k

score 2 · Accepted Answer · 2021-07-09

2

Entering edit mode

2.8 years ago

benformatics 3.9k

You could just split your fastq files into multiple chunks (assuming you know the total number of reads) and then run multiple trim_galore commands.

You could try this if you have seqkit:

seqkit split2 -1 reads_1.fq.gz -2 reads_2.fq.gz -p 2 -O out

Or normally (you have to re-gzip them at the end):

zcat XXX.recal.fastq.gz | split -l 4000000 - prefix