BLAT had (and still has) a program called gfServer which would keep the index in memory. This BLAT server would run as a daemon (even on a separate server) and you would start a BLAT client to run your alignments.
Why can't a NGS aligner be kept running in this mode of operation? Seems it would be a no-brainer for big institutions instead of firing up BWA every time.
I think that the data sizes of next-generation sequencing have flipped the expectations of the earlier BLAT world - previously, the queries were small, but the genomes (databases) were large. Now, a bwa-indexed human genome is a few gigabytes, but a HiSeq lane is dozens of gigabytes. Therefore, loading the reference genome isn't the bottleneck at all, so it doesn't save a ton of time to do fancy things to keep it around between runs. For the mammalian-sized things we do, a typical bwa run loads the reference into memory in ~minutes, and the reading and processing of the reads takes ~hours. It helps that the expensive indexing step is only performed once, so the aligner can effectively load the raw index data structure straight into memory. OS-level optimizations, like reusing memory-mapped pages between processes, can help at the margins, but again it doesn't help a ton to optimize something that's only ~1% of the runtime.
Of course, this answer is for the current state of things - future situations where you'd align to dozens or hundreds of genomes simultaneously, or stream reads directly off the sequencer into a mapping server, may necessitate a turn to the kind of ideas you suggest.
Doesn't STAR do that in part? It can at least leave the genome in memory after exiting so you only have to load it once (that really speeds things up). Sounds like a similar concept.