How to paralle trim_galore for single sample?
1
0
Entering edit mode
17 months ago

It takes about 15 hours for my paired end human genome to be trimmed. Although there are ways to parallel trim_galore for multiple sample, could you suggest method for single sample?

trim_galore • 741 views
ADD COMMENT
2
Entering edit mode

TrimGalore! is a wrapper around cutadapt. You could therefore you cutadapt directly, which has a multithreading option, and in case pigz is in PATH it will use this for (de)compression rather than default gzip. That all will speed-up things. pigz is a multithreaded version of gzip.

ADD REPLY
0
Entering edit mode

I move my answer to comment as this is not possible within a given nextflow pipeline without modifications.

ADD REPLY
0
Entering edit mode

You could split your input file into multiple pieces and trim those in parallel.

Or you can use a multi-threaded trimming program like bbduk.sh to speed the process up. A guide for bbduk is available.

ADD REPLY
2
Entering edit mode
17 months ago

You could just split your fastq files into multiple chunks (assuming you know the total number of reads) and then run multiple trim_galore commands.

You could try this if you have seqkit:

seqkit split2 -1 reads_1.fq.gz -2 reads_2.fq.gz -p 2 -O out

Or normally (you have to re-gzip them at the end):

zcat XXX.recal.fastq.gz | split -l 4000000 - prefix
ADD COMMENT
0
Entering edit mode

I'm using nf-core/sarek and it has an option for splitting, so will try it out.

ADD REPLY

Login before adding your answer.

Traffic: 1638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6