So my department is considering spending some money on upgrading our computing facilities which are pretty under-powered for any kind of serious sequencing analysis. I've been asked to come up with a rough idea of what were going to need for the next 5 to 10 years.
The obvious points are
- lots of CPUs
- lots of RAM
- lots of Storage
but I was hoping someone might know some resource I could read or take a look at, that might enable me to come up with something that isn't a complete guess.
I was thinking somewhere in the region of 8 or 12 cores per node, at 100 nodes total, 96 - 128 GB RAM per node, and (probably ridiculous, but) 5000 Tb storage. We have a lot of samples that will be sequenced (probably not whole genome) exome, I would imagine, plus various other sequencing activities like RNA-seq and Chip-Seq. Things I'm woefully ignorant of are the architecture of these systems. Should we be building a distributed system (all the tech will likely be housed in one place), what kind of tech do we need to run the right software that I'll be able to make full use of, for the mapping and variant calling, etc. Power and cooling requirments, space requirments.
Since it will be mostly me setting up the pipelines and pushing the data through it, I want to come up with some concrete numbers that will ensure that we can get analyses done quickly, and that we will have a system that we can scale as our needs increase (future-proofing).
Hope someone knows of something, Cheers, Davy.