Question: Running assemblies in parallel
3
gravatar for marit.hetland
2.5 years ago by
marit.hetland30 wrote:

Hello,

I have 600+ bacterial isolates (*fastq.gz files) that I want to assemble.

I have a script which uses a for loop to trim adapters and run SPAdes on each isolate, but this takes a very long time, and my computer should be powerful enough to run the script for multiple isolates at one time, which is what I want to do.

I have looked at GNU parallel which seems to work well and be faster than the for loop. As an example, I do the trimming like this:

parallel 'trimmomatic PE {}R1*.f*q.gz {}R2*.f*q.gz {}pair_R1.fq.gz {}unpair_R1.fq.gz {}pair_R2.fq.gz {}unpair_R2.fq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36' ::: $(ls *.fastq.gz | rev | cut -c 16- | rev | uniq)

My question is; When running parallel in a folder containing my 600+ filepairs, will parallel try to run the program on all 600+ isolates simultaneously or will it limit the amount of files run at one time based on how much the computer can manage at one time? Or is there a way to limit how many files parallel should work on at one time, apart from specifying the specific files?

Thank you!

parallel fastq assembly • 1.1k views
ADD COMMENTlink modified 2.5 years ago by Joe17k • written 2.5 years ago by marit.hetland30
3
gravatar for Joe
2.5 years ago by
Joe17k
United Kingdom
Joe17k wrote:

Parallel will run as many as it can concurrently* (i.e the number of cores you have, unless you specify otherwise with -j. There is also a "max load" flag you can use in the manual to manage the load on the system.

*Edit, the caveat of this being that you're running/invoking enough commands. Obviously if you have 32 cores, but only 16 files to work on, only 16 threads will be spawned.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Joe17k
2
gravatar for WouterDeCoster
2.5 years ago by
Belgium
WouterDeCoster44k wrote:

If you would have a look at the parallel documentation, you would find the -j argument, which you can use to limit the number of jobs. There are probably more options to do similar things.

ADD COMMENTlink written 2.5 years ago by WouterDeCoster44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1577 users visited in the last hour