How to paralle trim_galore for single sample?
1
0
Entering edit mode
6 months ago

It takes about 15 hours for my paired end human genome to be trimmed. Although there are ways to parallel trim_galore for multiple sample, could you suggest method for single sample?

trim_galore • 498 views
2
Entering edit mode

TrimGalore! is a wrapper around cutadapt. You could therefore you cutadapt directly, which has a multithreading option, and in case pigz is in PATH it will use this for (de)compression rather than default gzip. That all will speed-up things. pigz is a multithreaded version of gzip.

0
Entering edit mode

I move my answer to comment as this is not possible within a given nextflow pipeline without modifications.

0
Entering edit mode

You could split your input file into multiple pieces and trim those in parallel.

Or you can use a multi-threaded trimming program like bbduk.sh to speed the process up. A guide for bbduk is available.

2
Entering edit mode
6 months ago
benformatics ★ 2.6k

You could just split your fastq files into multiple chunks (assuming you know the total number of reads) and then run multiple trim_galore commands.

You could try this if you have seqkit:

seqkit split2 -1 reads_1.fq.gz -2 reads_2.fq.gz -p 2 -O out


Or normally (you have to re-gzip them at the end):

zcat XXX.recal.fastq.gz | split -l 4000000 - prefix

0
Entering edit mode

I'm using nf-core/sarek and it has an option for splitting, so will try it out.