Question: What Ever Happened To Alignment Servers?
gravatar for Jeremy Leipzig
7.3 years ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:

BLAT had (and still has) a program called gfServer which would keep the index in memory. This BLAT server would run as a daemon (even on a separate server) and you would start a BLAT client to run your alignments.

Why can't a NGS aligner be kept running in this mode of operation? Seems it would be a no-brainer for big institutions instead of firing up BWA every time.

alignment blat • 1.6k views
ADD COMMENTlink modified 5 weeks ago by Biostar ♦♦ 20 • written 7.3 years ago by Jeremy Leipzig19k

Doesn't STAR do that in part? It can at least leave the genome in memory after exiting so you only have to load it once (that really speeds things up). Sounds like a similar concept.

ADD REPLYlink written 7.3 years ago by Devon Ryan97k
gravatar for matted
7.3 years ago by
Boston, United States
matted7.3k wrote:

I think that the data sizes of next-generation sequencing have flipped the expectations of the earlier BLAT world - previously, the queries were small, but the genomes (databases) were large. Now, a bwa-indexed human genome is a few gigabytes, but a HiSeq lane is dozens of gigabytes. Therefore, loading the reference genome isn't the bottleneck at all, so it doesn't save a ton of time to do fancy things to keep it around between runs. For the mammalian-sized things we do, a typical bwa run loads the reference into memory in ~minutes, and the reading and processing of the reads takes ~hours. It helps that the expensive indexing step is only performed once, so the aligner can effectively load the raw index data structure straight into memory. OS-level optimizations, like reusing memory-mapped pages between processes, can help at the margins, but again it doesn't help a ton to optimize something that's only ~1% of the runtime.

Of course, this answer is for the current state of things - future situations where you'd align to dozens or hundreds of genomes simultaneously, or stream reads directly off the sequencer into a mapping server, may necessitate a turn to the kind of ideas you suggest.

ADD COMMENTlink modified 14 months ago by _r_am31k • written 7.3 years ago by matted7.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1854 users visited in the last hour