the cluster at my university often makes me wait for days until my jobs get from the queue to execution. I was therefore wondering, if you have experiences with AWS or other clouds for scientific purposes and if it's a financially reasonable alternative.

I dont need more than 10TB storage and only do medium sized RNA-seq data processing, which doesn't require to much computational power.

Or would you stay at university owned eviroments?

written 4 weeks ago by caggtaagtat240

That depends on whether your lab is willing to pay an additional cost when there are free resources provided by the university, and whether the additional cost justifies the urgency of the analyses. Dynamic monthly costs can also complicate the billing matters. In addition to price, there's potentially a really steep learning curve to refactor existing code to work in the cloud environment. Then that's security concerns. Do you have the resources to manage the cloud's infrastructure yourself or is your university's IT team supportive of the idea of going cloud? On paper, everything might look extremely similar: launch some machines with a scheduler, schedule jobs, and voila, the job is done. In practice, things can be quite different and demanding.

My experience with shared HPC is that jobs with 1cpu/2gb with short wall time should be scheduled quicker. If your HPC is maintained correctly, most issues are derived from users asking way more resources than the jobs need. You should raise your concerns to the appropriate parties and hopefully, something can be done to improve things.

If you have the opportunity to explore cloud computing, I'd highly recommend you to do it. It isn't going anywhere and that skill set can be helpful for future opportunities.

written 4 weeks ago by Eric Lim860

Maybe I could talk to the IT team, if it would be generally possible. It's probably more practical to stay at the universities HPC. The HPC team wrote a mail a few weeks ago, that the cluster is full because some users asked for more resources they needed.

I guess this could also still come from the damages at the cooling system of the HPC back then. Nevertheless, I'm curious to work with something like AWS and would maybe also try it out if the waiting periods get shorter again.

written 29 days ago by caggtaagtat240

Did you check with the HPC facility about the delays?

Maybe the choice of queues, amount of requested cores and memory might be causing the HPC to schedule the job with such a delay.

written 4 weeks ago by Gjain5.2k

It's general long waiting times, due to high demand, I guess.

These delays happen frequently with jobs, which need 1 cpu and 2GB RAM.

written 4 weeks ago by caggtaagtat240

Maybe check priority queue ?

written 4 weeks ago by Medhat7.6k

If your cluster uses "fair share" principles you should not need to wait for days so I will assume it does not. What scheduler does your cluster use?

written 4 weeks ago by genomax54k

The clustere uses PBSPro.

I'm no informatician, but when I worked at another universities HPC, I didn't have to execute scripts with qsub, but could also just login to a free node and execute my scripts directly in the terminal, if that makes sense.

written 29 days ago by caggtaagtat240

You should make some inquiries to see why your jobs pend that long with IT admins. Perhaps something is incorrectly setup and your account has been given low priority. In general, on shared compute infrastructure all users should have the same basic priority. So a user starting 5 jobs should have them start reasonably soon compared to someone who submits a 1000 at one time.

written 29 days ago by genomax54k

Ok thank you, I will wait and see if the situation maybe improves by itself any time soon and then talk with the HPC team of my university. Since you mention it, it can be, that I maybe have lower priority, since i was told by the IT admins, that people of the medical department of the university get needlessly throttled in the downloading/uploading speed, because some other departement decided this apparently. There this already a collective complaint on its way, but formal matters of the university tend to take forever.

written 29 days ago by caggtaagtat240

I've interacted with a few HPC teams and they usually are sympathetic to users. Most HPC teams are being constrained by university policy/funding shortages as well, and building a relationship with the team will always work out in your favor.

written 29 days ago by Ram17k

Yes they are great and helped me a lot! They also arranged the collective complaint, to change restictions for medical institutions to the HPC.

written 29 days ago by caggtaagtat240

AWS usually does $100/TB/month I think, so storage would end up costing you a lot of money.

written 4 weeks ago by Ram17k

Ok, its probably not very wise to switch then, just for occasional faster job execution

written 29 days ago by caggtaagtat240

Yes, unless you've thought out all the details. Cloud AFAIK has a ton of hidden costs and needs an expert to manage infrastructure allocations/requests.

written 29 days ago by Ram17k

Ok definitly staying with the university HPC then :)

written 29 days ago by caggtaagtat240

Did you look into the interactive mode? This is what you have done in the past when you login into HPC and log to one node. ....You can use interactive mode to login to the node and run commands there.

written 29 days ago by Gjain5.2k

Yes I sometimes work in interactive mode, but it usaully takes some time to be able to log in and I therefore just submit jobs

written 28 days ago by caggtaagtat240

In that case, talking to the HPC staff about your scheduling problem might help

written 28 days ago by Gjain5.2k

Yeah, maybe this is connected to the throtteling of acess from medical facilities

written 28 days ago by caggtaagtat240

might very well be possible. The HPC staff can clarify this.

written 27 days ago by Gjain5.2k
