Forum: What server do you use?
0
gravatar for caggtaagtat
3 months ago by
caggtaagtat330
caggtaagtat330 wrote:

Hi there,

the cluster at my university often makes me wait for days until my jobs get from the queue to execution. I was therefore wondering, if you have experiences with AWS or other clouds for scientific purposes and if it's a financially reasonable alternative.

I dont need more than 10TB storage and only do medium sized RNA-seq data processing, which doesn't require to much computational power.

Or would you stay at university owned eviroments?

aws forum hpc • 415 views
ADD COMMENTlink modified 7 weeks ago by kate.kross900 • written 3 months ago by caggtaagtat330
2

That depends on whether your lab is willing to pay an additional cost when there are free resources provided by the university, and whether the additional cost justifies the urgency of the analyses. Dynamic monthly costs can also complicate the billing matters. In addition to price, there's potentially a really steep learning curve to refactor existing code to work in the cloud environment. Then that's security concerns. Do you have the resources to manage the cloud's infrastructure yourself or is your university's IT team supportive of the idea of going cloud? On paper, everything might look extremely similar: launch some machines with a scheduler, schedule jobs, and voila, the job is done. In practice, things can be quite different and demanding.

My experience with shared HPC is that jobs with 1cpu/2gb with short wall time should be scheduled quicker. If your HPC is maintained correctly, most issues are derived from users asking way more resources than the jobs need. You should raise your concerns to the appropriate parties and hopefully, something can be done to improve things.

If you have the opportunity to explore cloud computing, I'd highly recommend you to do it. It isn't going anywhere and that skill set can be helpful for future opportunities.

ADD REPLYlink modified 3 months ago • written 3 months ago by Eric Lim1.1k

Maybe I could talk to the IT team, if it would be generally possible. It's probably more practical to stay at the universities HPC. The HPC team wrote a mail a few weeks ago, that the cluster is full because some users asked for more resources they needed.

I guess this could also still come from the damages at the cooling system of the HPC back then. Nevertheless, I'm curious to work with something like AWS and would maybe also try it out if the waiting periods get shorter again.

ADD REPLYlink written 3 months ago by caggtaagtat330

Did you check with the HPC facility about the delays?

Maybe the choice of queues, amount of requested cores and memory might be causing the HPC to schedule the job with such a delay.

ADD REPLYlink written 3 months ago by Gjain5.2k

It's general long waiting times, due to high demand, I guess.

These delays happen frequently with jobs, which need 1 cpu and 2GB RAM.

ADD REPLYlink written 3 months ago by caggtaagtat330

Maybe check priority queue ?

ADD REPLYlink written 3 months ago by Medhat7.9k

If your cluster uses "fair share" principles you should not need to wait for days so I will assume it does not. What scheduler does your cluster use?

ADD REPLYlink written 3 months ago by genomax57k

The clustere uses PBSPro.

I'm no informatician, but when I worked at another universities HPC, I didn't have to execute scripts with qsub, but could also just login to a free node and execute my scripts directly in the terminal, if that makes sense.

ADD REPLYlink written 3 months ago by caggtaagtat330
1

You should make some inquiries to see why your jobs pend that long with IT admins. Perhaps something is incorrectly setup and your account has been given low priority. In general, on shared compute infrastructure all users should have the same basic priority. So a user starting 5 jobs should have them start reasonably soon compared to someone who submits a 1000 at one time.

ADD REPLYlink written 3 months ago by genomax57k

Ok thank you, I will wait and see if the situation maybe improves by itself any time soon and then talk with the HPC team of my university. Since you mention it, it can be, that I maybe have lower priority, since i was told by the IT admins, that people of the medical department of the university get needlessly throttled in the downloading/uploading speed, because some other departement decided this apparently. There this already a collective complaint on its way, but formal matters of the university tend to take forever.

ADD REPLYlink written 3 months ago by caggtaagtat330

I've interacted with a few HPC teams and they usually are sympathetic to users. Most HPC teams are being constrained by university policy/funding shortages as well, and building a relationship with the team will always work out in your favor.

ADD REPLYlink written 3 months ago by RamRS18k

Yes they are great and helped me a lot! They also arranged the collective complaint, to change restictions for medical institutions to the HPC.

ADD REPLYlink written 3 months ago by caggtaagtat330

I really hate it when people do that. Just because it's free right now, doesn't mean that it's free five minutes from now. SGE wouldn't see your work thou so our jobs would end up competing for resources. It's ok for very small stuff. Everything else, absolute NO!

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by 5heikki7.8k

Agreed. The line to be drawn involves a bit of trial and error for a few tasks though. For example, I've run tar jobs on both a login node+screen as well as a job. One has to estimate the amount of time and resources required and take a call based on that.

ADD REPLYlink written 7 weeks ago by RamRS18k

Is it possible to change a job after it has been submitted to a PBS queue? If yes, you could set up e.g. crontab to submit a echo "hello" job every few hours. Then whenever you need to run something you could modify the submitted job that is next up. A nice (pun sort of) admin wouldn't bother you about it :)

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by 5heikki7.8k

Usually, HPC systems allow you to change most operational parameters except the actual job script and in some cases, the wall time. Even if they do allow you to change wall time after submission, in all probability you cannot change it once the job starts.

ADD REPLYlink written 7 weeks ago by RamRS18k

Admins can add/change the wall time. I have had to do that a few times with SLURM.

ADD REPLYlink written 7 weeks ago by genomax57k

True, admins can do most stuff - I'm referring to user level permissions :-)

ADD REPLYlink written 7 weeks ago by RamRS18k

AWS usually does $100/TB/month I think, so storage would end up costing you a lot of money.

ADD REPLYlink written 3 months ago by RamRS18k

Ok, its probably not very wise to switch then, just for occasional faster job execution

ADD REPLYlink written 3 months ago by caggtaagtat330

Yes, unless you've thought out all the details. Cloud AFAIK has a ton of hidden costs and needs an expert to manage infrastructure allocations/requests.

ADD REPLYlink written 3 months ago by RamRS18k

Ok definitly staying with the university HPC then :)

ADD REPLYlink written 3 months ago by caggtaagtat330

Did you look into the interactive mode? This is what you have done in the past when you login into HPC and log to one node. ....You can use interactive mode to login to the node and run commands there.

ADD REPLYlink written 3 months ago by Gjain5.2k

Yes I sometimes work in interactive mode, but it usaully takes some time to be able to log in and I therefore just submit jobs

ADD REPLYlink written 3 months ago by caggtaagtat330

In that case, talking to the HPC staff about your scheduling problem might help

ADD REPLYlink written 3 months ago by Gjain5.2k

Yeah, maybe this is connected to the throtteling of acess from medical facilities

ADD REPLYlink written 3 months ago by caggtaagtat330

might very well be possible. The HPC staff can clarify this.

ADD REPLYlink written 3 months ago by Gjain5.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1020 users visited in the last hour