4.7 years ago by
Don't go near 126 thousand; the max threads for an operating system includes moving the hard-drives and networking, and everything else. See "ps aux" for a list of how many things are already active!
Choosing a threading level for a program is based on a dozen factors, usually if you want to do it if you have more CPU cores sitting idle, while the single-threaded version of psiblast is running 100% of one core. Try two and see if you can get two cores to 100%. Maximally you'd benefit from the number of threads = number of CPU cores, unless there is other non-cpu constraints.
It depends on the system architecture and the software architecture. In my experience, a dual processor, quad core (for 8 threads max) can do something like compute pi in eight threads efficiently; If instead of computing pi, you were reading big files from disk, and have only the one disk, youll see any more than 1-2 threads slowing each other down as they have to wait for data. Running six threads accessing the harddrive will be slower than three. This ratio depends on how much of each resource is needed by each thread.
Something like sequence alignment (BWA or Bowtie etc) needs to read a little data, then crunch a lot of CPU, so spinning up all 8 threads is fine, theyll wait their turn for data, and then get off-sync from each other and end up with 100% disk utilization and almost 8x100% CPU.
If your processes are for example requesting data from a web-server with some unknown delays, then you could run a dozen or a hundred threads and theyll wait and go when they can.
It also depends on the motherboard architecture. You probably only have 2 or 4 channels to access RAM, so running more than 4 threads that need high volume access to RAM will also slow each other down.
The key word is contention. The usual solution is trial and error; you will have to measure the speed for various threads-settings and choose an optimal for your task. I dont know about psiblast specifically, but it probably needs to access RAM quickly, and youll see it get slower per thread after 5 threads. Maybe the optimal is 6-8, but I guarantee trying 1,000 simultaneously will not be faster than 10.
Finally, of course is the level of parallelism available to the algorithm, sometimes BLAST has to work sequentially and will ignore your thread setting for some parts of the job, so these optimal settings can vary with the reference genome and query sequences.