Let's say you were involved/responsible in a bioinformatics HPC infrastructure project, you'd have to come up with a solution in the current market which met requirements with functional, budgetary, and technical aspects.
Assume the infrastructure was going to serve a standard biomedical research institution with various biology labs and projects, including nextgen sequencing for DNA/RNA/etc, proteomic and metabolic profiling, microarrays, microscopy and and possibly medical imaging, and various general related databasing and applications. Let's also include some IaaS capability for easier provisioning for general and even HPC requests. That hopefully gives a rough scope for the functional aspects.
To accomodate large memory tasks like denovo assembly, RAM is a priority with a good # of cores accessing shared system memory. Of course, storage scaling and performance are salient as well.
Using budget as a surrogate for scale, let's define 3 approximate levels:
- small: $50K
- medium: $250K
- large: $1M
Keeping in mind issues related to computational and environmental power, density, scaling, configuration, and maintenance - how would you go about spec'ing your solution / spend your budget for each level? Also assume additional costs for install, service, etc will not need to be spent from budget. Some HPC server options I've seen out there include Dell, TransTec, and Supermicro.
Please share thoughts and experiences from the wise to the wary!
I am obviously biased but some clarifying questions. Have you considered facilities costs (do you need to build a data center)? Your RAS costs (replacements, servicing), how long you plan to amortize your facility and your equipment? What's your networking infrastructure (IB, 10gigE, gigE), network topology? All of these are things you should be thinking about if you want to do anything reasonably serious and sustainable.
I would argue that today 3 years is an eternity for a cluster given that Intel has a two ticks and a tock in that timeframe.
Excellent and relevant question.
I currently have the luxury of having adequate space in an existing enterprise data center; for simplicity I'd focus on hardware/software, but I included environmental issues in the question as a factor in specifications