Are you looking for something like this ?
Bump. You can not run bwa with 1000 threads on one host and expect it to be 1000x faster than single thread. There will be a theoretical optimum which is a function of memory available per thread and reference genome size. Has anyone actually done tests?