Question

MPI/multi-threading in salmon quasi-mapping

0

Entering edit mode

4.1 years ago

vaushev ▴ 20

As I understand, without explicitly indicating -p/--threads option, salmon tries to take all available hardware threads. Is there a way to find how many threads were actually available and used by salmon?

Also, what happens when salmon is run on a cluster and is assigned to multiple different host nodes? For example, I allocated 16 cores and added -p 16 to the parameters; running was extremely slow (more than 40 min for just one sample, compared to 4 min when I run from the login node without indicating -p). Our admin explained it may be because allocated cores were on different nodes (ex., 12 cores at one node and 4 cores at another one). But what really happens in this case for salmon? Does it run like a single thread, or what?

Overall, is it correct to say that salmon is multi-threaded but not MPI-capable?

salmon RNA-Seq • 1.8k views

ADD COMMENT • link updated 4.1 years ago by Devon Ryan 104k • written 4.1 years ago by vaushev ▴ 20

0

Entering edit mode

I think you are overcomplicating things for such a simple job as RNA-seq quantification. The tool is blazingly fast. Simply use job scheduler or something like GNU parallel that launches jobs for all available fastq files and set -p explicitely to a reasonable number like 8. Speed gain beyond that is probably small due to I/O limitations. If you have a standard node with like 72 cores you can easily run like 8 jobs in parallel, depending on available memory. Even large datasets will be quantified in a few hours. No need for MPI here.

ADD REPLY • link 4.1 years ago by ATpoint 81k

0

Entering edit mode

thanks for the input Alexander! But honestly I can't agree that it's something not serious to discuss - as I said, in my example quantification of just 1 sample took 40 minutes, and I have a hundred of samples. Simply running it with -p 8 would not be enough because depending on the nodes availability, it could still happen that those 8 cores would fall on separate nodes. I had to explicitly request to limit cores allocation to a single node, and this helped - but I wanted to understand what's going on there, that's why I posted this question.

ADD REPLY • link 4.1 years ago by vaushev ▴ 20

0

Entering edit mode

I did not say it was not serious. I said you imho are overcomplicating things by requesting cores from different nodes. Personally I like to keep things as simple as possible. Book a single node and use all available cores, split over multiple jobs by GNU parallel or using job arrays, that is very simple yet effective. The slowdown you experience is probably (or most likely as you already said) because salmon is MPI optimized.

ADD REPLY • link 4.1 years ago by ATpoint 81k

0

Entering edit mode

well initially I was not requesting different nodes, it's the scheduler which assigns my task that way by default: if I want to prevent this behavior and keep my task on a single node, I have to indicate it explicitly by adding a separate parameter (in my case, it's a separate line span[hosts=1] in a task file).

ADD REPLY • link 4.1 years ago by vaushev ▴ 20

score 1 · Accepted Answer · 2020-02-29

1

Entering edit mode

4.1 years ago

Devon Ryan 104k

The "available" threads is whatever your system returns with the following C++ code: https://github.com/COMBINE-lab/salmon/blob/7c5e8642d3fb86b0b0044201f2bb469f1392d7a7/src/SalmonQuantify.cpp#L2213

Note that if you're using alevin for scRNA-seq quantification that it will default to 25% of the threads and print that to the screen: https://github.com/COMBINE-lab/salmon/blob/c0218869de3f3ede723e0e0e59617b766fcd8035/src/AlevinUtils.cpp#L542-L543

All programs not explicitly written with MPI support (basically, almost all programs you will ever use) will not benefit from being run on multiple nodes. What you'll end up doing is running the same thing on multiple computers and the various processes will just overwrite each other's output.

When using a cluster, make sure that each command is sent to a single node and that (A) you tell the command how many cores it can use and (B) tell the cluster how many cores the tool will be using. Your cluster administrator can help you determine how to do (B).

ADD COMMENT • link 4.1 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you Devon! I know how to set the scheduler to restrict the execution to a single node, I just didn't care about it because I didn't realize until now that most of the commonly used bioinformatic tools (say, salmon, STAR etc. Are there any notable exceptions at all?) are not written with MPI support.

ADD REPLY • link 4.1 years ago by vaushev ▴ 20

1

Entering edit mode

There aren't really any notable exceptions to the no-MPI rule. Applications that really require MPI for compilations are usually a real pain for users to install, since they need to tweak settings for their particular implementation (I learned this when supporting Bison, which is a WGBS aligner I wrote that uses MPI but is not exactly commonly used since it's a real pain to install).

ADD REPLY • link 4.1 years ago by Devon Ryan 104k