Using bowtie2 in parallel
1
0
Entering edit mode
9.2 years ago
senzasord • 0

We're trying to use bowtie2 to find exact matches to short DNA sequences in a complete genome. We may search for hundreds of thousands of short sequences at a time. At first, it seemed like the way to do this is to spawn a bunch of threads and run lots of separate queries in parallel. However, we're finding that on a 30 core machine hitting a single index on a local disk, any more than 3 threads results in a significant slowdown.

We are using the '--mm' option which according to the manual, tells bowtie2 to use memory-mapped I/O so many bowtie's can share the index. Used interactively for a single query, --mm resulted in a noticeable speedup. However, I'm wondering if we're running into problems where the shared, memory-mapped I/O requires some mutex coordination which is causing things to bog down when hit by multiple threads. In that case, could we increase throughput by taking a hit each individual query but utilize our full 30 cores?

alignment bowtie2 • 6.5k views
ADD COMMENT
0
Entering edit mode
9.2 years ago
TriS ★ 4.7k

if I got your question right, have you tried the -p option?

from the bowtie2 manual:

Performance tuning
If your computer has multiple processors/cores, use -p

The -p option causes Bowtie 2 to launch a specified number of parallel search threads. Each thread runs on a different processor/core and all threads find alignments in parallel, increasing alignment throughput by approximately a multiple of the number of threads (though in practice, speedup is somewhat worse than linear).

ADD COMMENT
0
Entering edit mode

Yes. In our standard scheme, this doesn't help, because we handle each sequence by a separate query (and therefore, a heavyweight process). Adding threads in that way doesn't seem to help because there's not enough work done processing a single query to justify the threads. We could try handling multiple queries in a single process, in which case '-p' might help, but that works best for batch rather than on-line processing, and we need to be able to do both quickly.

ADD REPLY
0
Entering edit mode

It's not documented anywhere, but the bowtie2 source code suggests that you should be able to compile it easily enough as a library, so perhaps you can directly integrate it into whatever your current pipeline is that way.

BTW, the other possibility would just be to use a FIFO. Whether this will work will depend on the details of your pipeline.

ADD REPLY

Login before adding your answer.

Traffic: 3149 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6