I am in the process of designing an analysis pipeline for DNA-seq on human germline dna. Currently, we are using ion torrent and sequencing on a proton, for IDT's medical exome of 4500 genes. We are also looking into a nextseq, wes, and wgs. So my question will a virtual machine be sufficient or should I be looking more into linux clusters. We are a 400 bed hospital with growing ngs needs that are only increasing. My main concern with a VM is processing overhead and wanted to get some recommendations as to what others are doing. Thank you :).
We have been successful with buying and using our own dedicated hardware - for example one can get a 512GB RAM, 24 core, 100TB server for $25K or so. That can handle a surprising amount of work and it is a lot simpler to run and manage than a cluster. It is basically just one simple linux server and we are parallelizing via GNU Parallel. You should run some tests to see if that works for your needs.
Now the next step that I would recommend would not be to to building your own cluster - that in turn has massive costs in managing and maintaining it. Unless you already have the staff for it, though I doubt it, I imagine hospitals running windows. At that point, having staff supporting both linux and Windows based systems and integrate a diverse infrastructure gets very expensive. In that case I recommend to go with a service such as DNA Nexus where you don't need to bear the costs of running an infrastructure.
(Edit: corrected prices, specs)