sequence alignment performance
1
0
Entering edit mode
4.9 years ago
evelyn ▴ 230

I am creating sorted bam files from multiple paired-end fastq files using an array along with the parallel command. I am not specifying job numbers for parallel. I thought parallel should make the job faster but it takes longer to finish the job with parallel as compared to array without parallel. Any help is appreciated to understand why it is so?

alignment • 678 views
ADD COMMENT
1
Entering edit mode

I think this has been covered here already. I'm no parallel expert myself but from what I understood the thing is that you should not do parallel in combination with a for loop. You either use the for loop and do then serial or you stream your input files to parallel and do them all in parallel.

ADD REPLY
1
Entering edit mode

At some point you simply run out of RAM or I/O capacity on your system. Parallel will start multiple jobs but it can't overcome limitation of hardware you have available.

ADD REPLY
0
Entering edit mode
4.9 years ago
h.mon 35k

GNU parallel, by default, will use all cores, to avoid this behaviour, one has to use the option -j. As genomax said, you likely consumed all RAM and started to use swap, or hit disk IO limits, thus slowing down the overall run time compared to serial execution.

As lieven.sterck said, you don't need to use a for loop with GNU parallel, it has plenty of resources to deal with multiple inputs and simplify the command line. You can use the -max-lines option to control how many arguments will be passed to parallel:

ls *.fastq.gz | parallel --max-lines=2 echo "{1} {2} - Are a pair"
ADD COMMENT

Login before adding your answer.

Traffic: 1323 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6