What Ever Happened To Alignment Servers?
1
0
Entering edit mode
10.5 years ago

BLAT had (and still has) a program called gfServer which would keep the index in memory. This BLAT server would run as a daemon (even on a separate server) and you would start a BLAT client to run your alignments.

Why can't a NGS aligner be kept running in this mode of operation? Seems it would be a no-brainer for big institutions instead of firing up BWA every time.

alignment blat • 2.2k views
ADD COMMENT
1
Entering edit mode

Doesn't STAR do that in part? It can at least leave the genome in memory after exiting so you only have to load it once (that really speeds things up). Sounds like a similar concept.

ADD REPLY
4
Entering edit mode
10.5 years ago
matted 7.8k

I think that the data sizes of next-generation sequencing have flipped the expectations of the earlier BLAT world - previously, the queries were small, but the genomes (databases) were large. Now, a bwa-indexed human genome is a few gigabytes, but a HiSeq lane is dozens of gigabytes. Therefore, loading the reference genome isn't the bottleneck at all, so it doesn't save a ton of time to do fancy things to keep it around between runs. For the mammalian-sized things we do, a typical bwa run loads the reference into memory in ~minutes, and the reading and processing of the reads takes ~hours. It helps that the expensive indexing step is only performed once, so the aligner can effectively load the raw index data structure straight into memory. OS-level optimizations, like reusing memory-mapped pages between processes, can help at the margins, but again it doesn't help a ton to optimize something that's only ~1% of the runtime.

Of course, this answer is for the current state of things - future situations where you'd align to dozens or hundreds of genomes simultaneously, or stream reads directly off the sequencer into a mapping server, may necessitate a turn to the kind of ideas you suggest.

ADD COMMENT

Login before adding your answer.

Traffic: 987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6