I am starting at a new position, and I need to decide what to do regarding the computation, be it in cloud or in a local hardware (computer, server, or cluster). The group's computational resources are not set up, or at the institutional level (yet). My current variables to consider are:
- Time working
- CPUs needed
- GPU needed
- RAM needed
- Ease of passing over to other team members and future members
- Nature of work: Exploratory/Repeated, everyday usage
- IT/Facilities at the institution or collaborations
Am I missing any?
Currently I think that to do something locally each:
- Time: If some calculations take more than 4 days worth to have own hardware.
- CPUs: if the work needs below 4 cores it is not worth to buy special hardware as it can be done in a newish computer.
- GPU: Some software need it, so hardware should have some ports for the cards. But, I don't know much about price/usage, and there is high variability of sizes and capabilities AFAIK.
- Memory: Should have be able the data for the computation/job at hand or expandable if for backup/storage. Currently 2TB is easy to get even in a laptop. More than 4TB might be better.
- RAM: The more, the merrier, but some workstations I've seen have a cap of 128GB of RAM.
- A workstation is easy to leave to a colleague; if it is a server then there are some costs (if there is no expertise to administer it), clusters might need to have institutional support.
- Exploratory work can be better either via a local hardware or via a website with such capabilities so both local computer and cloud instances
- Without support server configuration can be a problem as well as cloud security of data. Existing IT infrastructure has to be taken into account. (As well as that promised X might take 3 or 4 times more than your group to buy and start using it)
Other considerations: This is not for data backup or storage.
Could someone provide some other heuristics to decide between cloud or buying hardware (besides cost comparisons)?
Related threads (usually asking for a specific conditions):
- Lab workstation/bioinformatics PC recommendations
- Best Hardware Solution For Medium-Size Bioinformatics Lab [15-20 Computers] (With newer replies like this one)
- Any Hardware Recommendations For A Molecular Biology Lab That'S Getting Into Bioinformatics? Answers mention having dedicated servers for each use: database, websites, computing
- Computer specs for Bioinformatics
- Buy PC for metagenomics
- Hardware Suitable For Generic Nextgen Sequencing Processing?
- Workstations For Ngs Analysis?
Did you check into local security and/or other policies? If they prohibit use of cloud resources then you have no choice.
Even if there are no formal policies now, it would serve you well to check with local IT/Security authority to get a clear answer. You don't want to get in trouble down the road.
While you may or may not know this now, but please include information about expected amount of data/workloads you will be doing each week/month etc.
Good point about security! I'm in a clinical setting and this should be up in the list! I somewhat included that under the IT/facilities of the institution, but I'll might need to ask this directly.
Yes, I'm currently estimating workloads based on the grants and tasks to be done.
If you do all your work in the cloud, where would you plan to store the data? If the data is stored in the cloud, then I think you would need to pay monthly fees to keep it there, right? If you export the data from the cloud, then I think you would also incur fees. AFAIK, Amazon AWS lets you upload the data for free, I think, but pulling it back down for archival or further analysis on your workstation might not be free. If this is a larger business decision, then you can also reach out to the cloud hosts for input, I know Google has a team specifically for helping institutions figure out how to integrate cloud resources. Depending on the scale and requirements, maintaining and managing your own local hardware might not be trivial.
My current plan is storing the data locally. Depending on the job downloading data won't be much (uploading fastq files might be big, but downloading a table of differential expressed/methylation/ variant calling,... are usually smaller). But I didn't know Google had a team for this, but this is not at the institutional level (yet). I will definetly reach out and evaluate the prices in the cloud, Google, AWS, Azure or others.
Yes, In my previous job I managed a server for 2 years without much problems or support, but it is not trivial.
It is good thing that you recognize that. If you are moving to a clinical setting you will almost certainly not be allowed to administer any critical systems on your own. Which is good because you should let those who's day job is "systems administrators" do that while you focus on doing research/diagnostic work.
I was already in a clinical setting (but the resources were only for research like in my current position) and was approved by the IT manager (after we couldn't use the institutional one). I wish I could do that and "forget" about administrating a system. But as a "pet bioinformatician" it is impossible without institutional support. Thanks for the advices!
What is preventing you from having a balanced mix of both? Split the budget, get some hardware to run locally, and get a subscription/contract with a cloud compute provider using the rest of the money.
Nothing, and I probably will end up combining both my problem is how to balance that. I need a PC to analyze or at least explore whatever I do in the cloud. I want to maximize the value of a good PC enough for current and forseable time and a cloud solution be in a company or at the institution.
The question of how to balance the budget allocation between the two would primarily depend upon how much money you have available and how it is allocated over time. Does your institution have some pre-existing computational resources already? Maybe you could collect a couple of PC towers that are no longer in use and set up a Slurm cluster on that? Then you have a (well, acceptable but probably inefficient) local compute resource that should be able to handle at least some bioinformatics work (no GPU stuff unless the machines have dedicated GPUs obviously), and you can preferentially allocate your budget towards whatever else you see fit?
There is no pre-existing computation resource yet (neither at the group or the institution level). I will ask the IT team and other bioinformaticians about pooling PCs together or how do they see this. Thanks!