Question

Running assemblies in parallel

3

Entering edit mode

6.2 years ago

marit.hetland ▴ 50

Hello,

I have 600+ bacterial isolates (*fastq.gz files) that I want to assemble.

I have a script which uses a for loop to trim adapters and run SPAdes on each isolate, but this takes a very long time, and my computer should be powerful enough to run the script for multiple isolates at one time, which is what I want to do.

I have looked at GNU parallel which seems to work well and be faster than the for loop. As an example, I do the trimming like this:

parallel 'trimmomatic PE {}R1*.f*q.gz {}R2*.f*q.gz {}pair_R1.fq.gz {}unpair_R1.fq.gz {}pair_R2.fq.gz {}unpair_R2.fq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36' ::: $(ls *.fastq.gz | rev | cut -c 16- | rev | uniq)

My question is; When running parallel in a folder containing my 600+ filepairs, will parallel try to run the program on all 600+ isolates simultaneously or will it limit the amount of files run at one time based on how much the computer can manage at one time? Or is there a way to limit how many files parallel should work on at one time, apart from specifying the specific files?

Thank you!

parallel Assembly fastq • 2.3k views

ADD COMMENT • link updated 6.2 years ago by Joe 21k • written 6.2 years ago by marit.hetland ▴ 50

2

Entering edit mode

6.2 years ago

WouterDeCoster 47k

If you would have a look at the parallel documentation, you would find the -j argument, which you can use to limit the number of jobs. There are probably more options to do similar things.

ADD COMMENT • link 6.2 years ago by WouterDeCoster 47k

score 3 · Accepted Answer · 2018-02-08

Parallel will run as many as it can concurrently* (i.e the number of cores you have, unless you specify otherwise with -j. There is also a "max load" flag you can use in the manual to manage the load on the system.

*Edit, the caveat of this being that you're running/invoking enough commands. Obviously if you have 32 cores, but only 16 files to work on, only 16 threads will be spawned.