Question

Forum:What server do you use?

0

Entering edit mode

5.8 years ago

caggtaagtat ★ 1.9k

Hi there,

the cluster at my university often makes me wait for days until my jobs get from the queue to execution. I was therefore wondering, if you have experiences with AWS or other clouds for scientific purposes and if it's a financially reasonable alternative.

I don't need more than 10TB storage and only do medium sized RNA-seq data processing, which doesn't require to much computational power.

Or would you stay at university owned enviroments?

HPC AWS • 2.6k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 5.8 years ago by caggtaagtat ★ 1.9k

2

Entering edit mode

That depends on whether your lab is willing to pay an additional cost when there are free resources provided by the university, and whether the additional cost justifies the urgency of the analyses. Dynamic monthly costs can also complicate the billing matters. In addition to price, there's potentially a really steep learning curve to refactor existing code to work in the cloud environment. Then that's security concerns. Do you have the resources to manage the cloud's infrastructure yourself or is your university's IT team supportive of the idea of going cloud? On paper, everything might look extremely similar: launch some machines with a scheduler, schedule jobs, and voila, the job is done. In practice, things can be quite different and demanding.

My experience with shared HPC is that jobs with 1cpu/2gb with short wall time should be scheduled quicker. If your HPC is maintained correctly, most issues are derived from users asking way more resources than the jobs need. You should raise your concerns to the appropriate parties and hopefully, something can be done to improve things.

If you have the opportunity to explore cloud computing, I'd highly recommend you to do it. It isn't going anywhere and that skill set can be helpful for future opportunities.

ADD REPLY • link 5.8 years ago by Eric Lim ★ 2.1k

0

Entering edit mode

Maybe I could talk to the IT team, if it would be generally possible. It's probably more practical to stay at the universities HPC. The HPC team wrote a mail a few weeks ago, that the cluster is full because some users asked for more resources they needed.

I guess this could also still come from the damages at the cooling system of the HPC back then. Nevertheless, I'm curious to work with something like AWS and would maybe also try it out if the waiting periods get shorter again.

ADD REPLY • link 5.8 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

Did you check with the HPC facility about the delays?

Maybe the choice of queues, amount of requested cores and memory might be causing the HPC to schedule the job with such a delay.

ADD REPLY • link 5.8 years ago by Gjain 5.8k

0

Entering edit mode

It's general long waiting times, due to high demand, I guess.

These delays happen frequently with jobs, which need 1 cpu and 2GB RAM.

ADD REPLY • link 5.8 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

Maybe check priority queue ?

ADD REPLY • link 5.8 years ago by Medhat 9.7k

0

Entering edit mode

If your cluster uses "fair share" principles you should not need to wait for days so I will assume it does not. What scheduler does your cluster use?

ADD REPLY • link 5.8 years ago by GenoMax 141k

0

Entering edit mode

The clustere uses PBSPro.

I'm no informatician, but when I worked at another universities HPC, I didn't have to execute scripts with qsub, but could also just login to a free node and execute my scripts directly in the terminal, if that makes sense.

ADD REPLY • link 5.8 years ago by caggtaagtat ★ 1.9k

1

Entering edit mode

You should make some inquiries to see why your jobs pend that long with IT admins. Perhaps something is incorrectly setup and your account has been given low priority. In general, on shared compute infrastructure all users should have the same basic priority. So a user starting 5 jobs should have them start reasonably soon compared to someone who submits a 1000 at one time.

ADD REPLY • link 5.8 years ago by GenoMax 141k

0

Entering edit mode

Ok thank you, I will wait and see if the situation maybe improves by itself any time soon and then talk with the HPC team of my university. Since you mention it, it can be, that I maybe have lower priority, since i was told by the IT admins, that people of the medical department of the university get needlessly throttled in the downloading/uploading speed, because some other departement decided this apparently. There this already a collective complaint on its way, but formal matters of the university tend to take forever.

ADD REPLY • link 5.8 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

I've interacted with a few HPC teams and they usually are sympathetic to users. Most HPC teams are being constrained by university policy/funding shortages as well, and building a relationship with the team will always work out in your favor.

ADD REPLY • link 5.8 years ago by Ram 43k

0

Entering edit mode

Yes they are great and helped me a lot! They also arranged the collective complaint, to change restictions for medical institutions to the HPC.

ADD REPLY • link 5.8 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

I really hate it when people do that. Just because it's free right now, doesn't mean that it's free five minutes from now. SGE wouldn't see your work thou so our jobs would end up competing for resources. It's ok for very small stuff. Everything else, absolute NO!

ADD REPLY • link 5.7 years ago by 5heikki 11k

0

Entering edit mode

Agreed. The line to be drawn involves a bit of trial and error for a few tasks though. For example, I've run tar jobs on both a login node+screen as well as a job. One has to estimate the amount of time and resources required and take a call based on that.

ADD REPLY • link 5.7 years ago by Ram 43k

0

Entering edit mode

Is it possible to change a job after it has been submitted to a PBS queue? If yes, you could set up e.g. crontab to submit a echo "hello" job every few hours. Then whenever you need to run something you could modify the submitted job that is next up. A nice (pun sort of) admin wouldn't bother you about it :)

ADD REPLY • link 5.7 years ago by 5heikki 11k

0

Entering edit mode

Usually, HPC systems allow you to change most operational parameters except the actual job script and in some cases, the wall time. Even if they do allow you to change wall time after submission, in all probability you cannot change it once the job starts.

ADD REPLY • link 5.7 years ago by Ram 43k

0

Entering edit mode

Admins can add/change the wall time. I have had to do that a few times with SLURM.

ADD REPLY • link 5.7 years ago by GenoMax 141k

0

Entering edit mode

True, admins can do most stuff - I'm referring to user level permissions :-)

ADD REPLY • link 5.7 years ago by Ram 43k

0

Entering edit mode

AWS usually does $100/TB/month I think, so storage would end up costing you a lot of money.

ADD REPLY • link 5.8 years ago by Ram 43k

0

Entering edit mode

Ok, its probably not very wise to switch then, just for occasional faster job execution

ADD REPLY • link 5.8 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

Yes, unless you've thought out all the details. Cloud AFAIK has a ton of hidden costs and needs an expert to manage infrastructure allocations/requests.

ADD REPLY • link 5.8 years ago by Ram 43k

0

Entering edit mode

Ok definitly staying with the university HPC then :)

ADD REPLY • link 5.8 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

Did you look into the interactive mode? This is what you have done in the past when you login into HPC and log to one node. ....You can use interactive mode to login to the node and run commands there.

ADD REPLY • link 5.8 years ago by Gjain 5.8k

0

Entering edit mode

Yes I sometimes work in interactive mode, but it usaully takes some time to be able to log in and I therefore just submit jobs

ADD REPLY • link 5.8 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

In that case, talking to the HPC staff about your scheduling problem might help

ADD REPLY • link 5.8 years ago by Gjain 5.8k

0

Entering edit mode

Yeah, maybe this is connected to the throtteling of acess from medical facilities