Question: multithreading in psiblast
1
gravatar for arronslacey
4.7 years ago by
arronslacey230
United Kingdom
arronslacey230 wrote:

Hi - I am exploring how many threads I can use for psiblast. The default value is 1 thread, and I want to know how high I can set this. Using

 

$ cat /proc/sys/kernel/threads-max

126335

I obtain the maximum number of processes available to the system. I somehow doubt that this is analogous to the amount of threads I can use in the psiblast "num_threads" flag. can anyone shed any light on this? Thanks

psiblast • 2.0k views
ADD COMMENTlink modified 4.7 years ago by karl.stamm3.4k • written 4.7 years ago by arronslacey230
3
gravatar for karl.stamm
4.7 years ago by
karl.stamm3.4k
United States
karl.stamm3.4k wrote:

Don't go near 126 thousand; the max threads for an operating system includes moving the hard-drives and networking, and everything else. See "ps aux" for a list of how many things are already active!

Choosing a threading level for a program is based on a dozen factors, usually if you want to do it if you have more CPU cores sitting idle, while the single-threaded version of psiblast is running 100% of one core. Try two and see if you can get two cores to 100%.  Maximally you'd benefit from the number of threads = number of CPU cores, unless there is other non-cpu constraints.

It depends on the system architecture and the software architecture. In my experience, a dual processor, quad core (for 8 threads max) can do something like compute pi in eight threads efficiently;  If instead of computing pi, you were reading big files from disk, and have only the one disk, youll see any more than 1-2 threads slowing each other down as they have to wait for data. Running six threads accessing the harddrive will be slower than three.  This ratio depends on how much of each resource is needed by each thread.

Something like sequence alignment (BWA or Bowtie etc) needs to read a little data, then crunch a lot of CPU, so spinning up all 8 threads is fine, theyll wait their turn for data, and then get off-sync from each other and end up with 100% disk utilization and almost 8x100% CPU.

If your processes are for example requesting data from a web-server with some unknown delays, then you could run a dozen or a hundred threads and theyll wait and go when they can.

It also depends on the motherboard architecture. You probably only have 2 or 4 channels to access RAM, so running more than 4 threads that need high volume access to RAM will also slow each other down.

The key word is contention. The usual solution is trial and error; you will have to measure the speed for various threads-settings and choose an optimal for your task.  I dont know about psiblast specifically, but it probably needs to access RAM quickly, and youll see it get slower per thread after 5 threads. Maybe the optimal is 6-8, but I guarantee trying 1,000 simultaneously will not be faster than 10.

Finally, of course is the level of parallelism available to the algorithm, sometimes BLAST has to work sequentially and will ignore your thread setting for some parts of the job, so these optimal settings can vary with the reference genome and query sequences.

ADD COMMENTlink written 4.7 years ago by karl.stamm3.4k

thank you for the very comprehensive answer. I've tried using 4 threads, and using htop the look at CPU useage each core is using around 90% of CPU useage (if I use the top command this comes up as 360%! which confused me at first). 

there are a lot of things to consider and I hope your answer helps others, not just me.

ADD REPLYlink written 4.7 years ago by arronslacey230

Yeah I think it's fun to think about. What does 90% CPU mean? It would be 100% if it wasn't waiting for something like disk or memory. Each additional thread will cause a little more contention and reduce your 90% a little more. If you have more than 4 cores available, don't stop at 360%, try to push it up as high as possible. 6 threads at 70% is more throughput than 4 threads at 90%, but not by much. At some point, if it's not enough, other things have to change, for example if you have two harddisks and you can read from one while writing to the other, that will reduce contention and boost your CPU utilization.

ADD REPLYlink written 4.7 years ago by karl.stamm3.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 650 users visited in the last hour