Bedtools is currently not multi-threaded and won't be for the foreseeable future. Multi-threading typically yields performance gains when the relevant data structures are stored in memory, so that the gains of additional threads are not eliminated by I/O constraints. In contrast, most of the bedtools algorithms read intervals from the driver (A) file line by line from disk in order to minimize memory consumption (e.g., loading a BAM file into memory is not advisable). This restriction, combined with the fact that, for many cases (e.g. sorted BAM), we really need the output to preserve the order of the original input. This poses additional restrictions on multi-threading.
That said, I have been working on scalable new algorithms for both distributed computing environments (clusters) and shared memory systems such as GPUs.
If anyone has the time and ability to effectively multi-thread the existing tools, I would be more than happy to collaborate!