GNU Parallel Number of Possible Threads
2
0
Entering edit mode
4.0 years ago
nhaus ▴ 300

Hello everybody,

I think I have a fairly basic question. I just discovered the GNU parallel package and I think my workflow can really benefit from it! I am using a loop which loops through my read files and generates the desired output. The command that is excecuted for each read looks something like this:

STAR --runThreadN 8 --genomeDir star_index/ --readFilesIn R1.fq R2.fq

As you can see I specified 8 threads, which is the amount of threads my virtual machine has.

My question now is this following: If I use GNU parallel with a command like this:

cat reads| parallel -j 3 STAR --runThreadN 8 --genomeDir star_index/ --readFilesIn {}_R1.fq {}_R2.fq

Can my virtual machine handle the number of threads I specified, if I execute 3 jobs in parallel?

Or do I need 24 threads (3*8 threads) to properly excecute this command?

Im sorry if this is a basic question, I am very new to the field and any help is much appreciated!

RNA-Seq STAR GNU Parallel multithreading • 5.0k views
ADD COMMENT
1
Entering edit mode

You can't use more cores/threads than what your hardware offers. If you are using a virtual machine you are already limited by the resources assigned to it (4 cores/8 threads in this case).

ADD REPLY
1
Entering edit mode

Yeah, don't mix parallel workflows and multithreaded applications unless they multiply to less than (or equal to) your CPU count. Its a good idea to leave a couple of threads spare though, as the machine will still have background tasks going that you won't want to compete with your job.

If you have 8 cores/16 threads, but you spawn 3 x 8 = 24 total processes, you will end up with CPU thrashing, and it will spend more time switching between queued tasks, and will ultimately run even slower than probably only assigning one or two threads to the process in the first place.

Bear in mind, it is also generally more efficient to run n instances of a single core/thread process, than it is to run 1 instance of a process with n threads. This does depend enormously on the program in question and the 'parallel-isability' of what you're doing.

I often batch run hhpred analyses via GNU parallel, but hhpred itself can use multiple threads. Typically, I'll tell it to use ~2 threads, and let GNU parallel balance the workloads out over available cores for all my input files. To a rough approximation, on our 32 core server, I'll have 16 files being analysed concurrently.

Take particular care if you're launching a lot of processes which also need read/write access to the same file. E.g., if I ran 20 scripts at once, which all needed access to a particular database file on disk, they will be competing for the I/O of that file too, so it won't necessarily result in that much of a speed up.

ADD REPLY
3
Entering edit mode
4.0 years ago
ATpoint 81k

Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them

Here with your command you'd need 24 threads, so the number of parallel jobs (3) times the number of threads in the STAR command (8). GNU parallel is powerful if you have plenty of resources like on a workstation or cluster node. With 8 threads I doubt you benefit from it. Just run a for loop.

ADD COMMENT
3
Entering edit mode
4.0 years ago

If you have 8 threads, run STAR without GNU parallel. GNU parallel meant to be used with commands that do not provide any parallelization. STAR already has that ability.

ADD COMMENT

Login before adding your answer.

Traffic: 2439 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6