How can I estimate how many ressources needed before running a script?
2
0
Entering edit mode
6 months ago
davidmaimoun ▴ 50

Hello,

How can I know how many resources, like CPU cores, memory and storage are reasonable to conduct computations most efficiently in Linux Ubuntu ?

Also I would like to know, when I run a script how many CPUs it takes (for ex, I want to run COBS on 1000 sequences and see how much CPUs it takes before run it on bigger amount of sequences)

I have 40 CPUs

Any suggestion?

Thank you !

available ressources cpu cores • 560 views
1
Entering edit mode
6 months ago
Joe 20k

The short answer is, unless you know a tool very well, there's very little way to know.

From a memory perspective, you can probably assume even a fairly poorly optimised script is unlikely to use much more memory than maybe 3-5X the size of the input dataset, even if it holds the whole thing in memory at once.

Storage is similar, but really this will just be whatever your input and output files are. It's less common for tools to write intermediate data to disk unless there is some sort of database or similar it just can't hold in RAM. Disk storage is so plentiful now though that I'd be surprised if this ever creates much of a concern.

CPUs are a little easier, since this is typically something you set, rather than the tool. Many tasks are not well suited to multi threading (or there aren't tools built to readily do it) so its less common that you'll come across a task/workflow that is really and truly reaping benefit from much beyond 15-20 cores, if that.

As for knowing what will be the most efficient, this is even harder to answer, because it heavily depends on how the tool was coded. You will just have to run some toy datasets with different parameters and see what works.

0
Entering edit mode

Thank you!

And when I am running my script, how can I check the amount of CPUs it is taking?

1
Entering edit mode

You can't know exactly as far as I'm aware.

You can use a tool like htop which will show you how much usage the processor cores are under, but this is all use, not just from your script.

You can view how many processes are being run for that task, and that will roughly correlate to the number of cores in use, but many multithreading approaches don't actually 'pin' a process to a core, and they can move around depending on what the queue for different cores is like. You can also see this information in htop, but you can also use ps and other similar tools.

0
Entering edit mode

Thank you so much for the help

0
Entering edit mode
6 months ago
davidmaimoun ▴ 50

Thank you very much Joe, it will be very helpful !