Forum:How to decide when to have local hardware vs cloud computing?
1
1
Entering edit mode
16 months ago
Lluís R. ★ 1.2k

I am starting at a new position, and I need to decide what to do regarding the computation, be it in cloud or in a local hardware (computer, server, or cluster). The group's computational resources are not set up, or at the institutional level (yet). My current variables to consider are:

  • Time working
  • CPUs needed
  • GPU needed
  • Memory/ROM
  • RAM needed
  • Ease of passing over to other team members and future members
  • Cost
  • Nature of work: Exploratory/Repeated, everyday usage
  • IT/Facilities at the institution or collaborations

Am I missing any?

Currently I think that to do something locally each:

  • Time: If some calculations take more than 4 days worth to have own hardware.
  • CPUs: if the work needs below 4 cores it is not worth to buy special hardware as it can be done in a newish computer.
  • GPU: Some software need it, so hardware should have some ports for the cards. But, I don't know much about price/usage, and there is high variability of sizes and capabilities AFAIK.
  • Memory: Should have be able the data for the computation/job at hand or expandable if for backup/storage. Currently 2TB is easy to get even in a laptop. More than 4TB might be better.
  • RAM: The more, the merrier, but some workstations I've seen have a cap of 128GB of RAM.
  • A workstation is easy to leave to a colleague; if it is a server then there are some costs (if there is no expertise to administer it), clusters might need to have institutional support.
  • Exploratory work can be better either via a local hardware or via a website with such capabilities so both local computer and cloud instances
  • Without support server configuration can be a problem as well as cloud security of data. Existing IT infrastructure has to be taken into account. (As well as that promised X might take 3 or 4 times more than your group to buy and start using it)

Other considerations: This is not for data backup or storage.

Could someone provide some other heuristics to decide between cloud or buying hardware (besides cost comparisons)?

Related threads (usually asking for a specific conditions):

hardware computational-resources cloud • 2.1k views
ADD COMMENT
2
Entering edit mode

Did you check into local security and/or other policies? If they prohibit use of cloud resources then you have no choice.

Even if there are no formal policies now, it would serve you well to check with local IT/Security authority to get a clear answer. You don't want to get in trouble down the road.

While you may or may not know this now, but please include information about expected amount of data/workloads you will be doing each week/month etc.

ADD REPLY
0
Entering edit mode

Good point about security! I'm in a clinical setting and this should be up in the list! I somewhat included that under the IT/facilities of the institution, but I'll might need to ask this directly.

Yes, I'm currently estimating workloads based on the grants and tasks to be done.

ADD REPLY
1
Entering edit mode

If you do all your work in the cloud, where would you plan to store the data? If the data is stored in the cloud, then I think you would need to pay monthly fees to keep it there, right? If you export the data from the cloud, then I think you would also incur fees. AFAIK, Amazon AWS lets you upload the data for free, I think, but pulling it back down for archival or further analysis on your workstation might not be free. If this is a larger business decision, then you can also reach out to the cloud hosts for input, I know Google has a team specifically for helping institutions figure out how to integrate cloud resources. Depending on the scale and requirements, maintaining and managing your own local hardware might not be trivial.

ADD REPLY
0
Entering edit mode

My current plan is storing the data locally. Depending on the job downloading data won't be much (uploading fastq files might be big, but downloading a table of differential expressed/methylation/ variant calling,... are usually smaller). But I didn't know Google had a team for this, but this is not at the institutional level (yet). I will definetly reach out and evaluate the prices in the cloud, Google, AWS, Azure or others.

Yes, In my previous job I managed a server for 2 years without much problems or support, but it is not trivial.

ADD REPLY
1
Entering edit mode

Yes, In my previous job I managed a server for 2 years without much problems or support, but it is not trivial.

It is good thing that you recognize that. If you are moving to a clinical setting you will almost certainly not be allowed to administer any critical systems on your own. Which is good because you should let those who's day job is "systems administrators" do that while you focus on doing research/diagnostic work.

ADD REPLY
0
Entering edit mode

I was already in a clinical setting (but the resources were only for research like in my current position) and was approved by the IT manager (after we couldn't use the institutional one). I wish I could do that and "forget" about administrating a system. But as a "pet bioinformatician" it is impossible without institutional support. Thanks for the advices!

ADD REPLY
0
Entering edit mode

What is preventing you from having a balanced mix of both? Split the budget, get some hardware to run locally, and get a subscription/contract with a cloud compute provider using the rest of the money.

ADD REPLY
0
Entering edit mode

Nothing, and I probably will end up combining both my problem is how to balance that. I need a PC to analyze or at least explore whatever I do in the cloud. I want to maximize the value of a good PC enough for current and forseable time and a cloud solution be in a company or at the institution.

ADD REPLY
0
Entering edit mode

The question of how to balance the budget allocation between the two would primarily depend upon how much money you have available and how it is allocated over time. Does your institution have some pre-existing computational resources already? Maybe you could collect a couple of PC towers that are no longer in use and set up a Slurm cluster on that? Then you have a (well, acceptable but probably inefficient) local compute resource that should be able to handle at least some bioinformatics work (no GPU stuff unless the machines have dedicated GPUs obviously), and you can preferentially allocate your budget towards whatever else you see fit?

ADD REPLY
0
Entering edit mode

There is no pre-existing computation resource yet (neither at the group or the institution level). I will ask the IT team and other bioinformaticians about pooling PCs together or how do they see this. Thanks!

ADD REPLY
1
Entering edit mode
16 months ago

I don't think that cloud will every replace having some local compute resources. Even if you what cloud, you will still need some local compute, even if thats a powerful workstation. Whether you choose local cluster or cloud for jobs that are too big for a workstation is another question. How often do you envisage having either single jobs that will take, say, more than 10 cores, or have a large number of low core jobs (you can probably run 10 4 core jobs one after the other on a workstation, but say 48x4hrx4 core jobs is going to be pushing it, like a 48 sample RNAseq experiment). If its often, then a local cluster is probably going to be cost efficient. If not very often than cloud is probably more reasonalbe. I would probably say that unless you have a dedicated sysadim, a local cluster is going to be difficult or impossible to run, but you might get away with a local (underdesk) server if you have intermediate needs (your 48x4x4 job might run in reasonable time in a 40 core under desk server if it had enough memory).

ADD COMMENT
0
Entering edit mode

Thanks Ian, I don't know the size of the data I'll be analyzing yet. In previous position I ran jobs for 10 days with10 cores in my local workstation but for running RNAseq experiments as the one you describe we used a server. Unfortunately we currently don't have a dedicated sysadmin and no local cluster, which ight change in the future as the institution is aware of this shortcoming. So I'll definetly will need something locally at least to test the code before sending it to the cloud

ADD REPLY

Login before adding your answer.

Traffic: 1488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6