Running assemblies in parallel
2
3
Entering edit mode
6.2 years ago

Hello,

I have 600+ bacterial isolates (*fastq.gz files) that I want to assemble.

I have a script which uses a for loop to trim adapters and run SPAdes on each isolate, but this takes a very long time, and my computer should be powerful enough to run the script for multiple isolates at one time, which is what I want to do.

I have looked at GNU parallel which seems to work well and be faster than the for loop. As an example, I do the trimming like this:

parallel 'trimmomatic PE {}R1*.f*q.gz {}R2*.f*q.gz {}pair_R1.fq.gz {}unpair_R1.fq.gz {}pair_R2.fq.gz {}unpair_R2.fq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36' ::: $(ls *.fastq.gz | rev | cut -c 16- | rev | uniq)

My question is; When running parallel in a folder containing my 600+ filepairs, will parallel try to run the program on all 600+ isolates simultaneously or will it limit the amount of files run at one time based on how much the computer can manage at one time? Or is there a way to limit how many files parallel should work on at one time, apart from specifying the specific files?

Thank you!

parallel Assembly fastq • 2.3k views
ADD COMMENT
3
Entering edit mode
6.2 years ago
Joe 21k

Parallel will run as many as it can concurrently* (i.e the number of cores you have, unless you specify otherwise with -j. There is also a "max load" flag you can use in the manual to manage the load on the system.

*Edit, the caveat of this being that you're running/invoking enough commands. Obviously if you have 32 cores, but only 16 files to work on, only 16 threads will be spawned.

ADD COMMENT
2
Entering edit mode
6.2 years ago

If you would have a look at the parallel documentation, you would find the -j argument, which you can use to limit the number of jobs. There are probably more options to do similar things.

ADD COMMENT

Login before adding your answer.

Traffic: 2272 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6