Question: Why Do Some Systems Biological Analyses Require So Much Shared Memory?
1
gravatar for nate
7.5 years ago by
nate10
nate10 wrote:

Hi all, I'm new to systems biology and I'm having trouble understanding why so many types of simulations/analyses require access to all elements in shared memory. Would the message passing on a cluster computer be reasonably fast to achieve this? I'm being told that I need special hardware (like an SGI machine) to undertake simulations of biological networks, and I don't fully grasp the computational problem. Can anyone explain the requirements in simple terms for a newbie?

• 1.2k views
ADD COMMENTlink modified 6.3 years ago by Dr. Mabuse47k • written 7.5 years ago by nate10
3

which applications are you talking about exactly?

ADD REPLYlink written 7.5 years ago by Flow1.5k

I think in this case this is not so important, it might apply to many different types of simulations, also molecular dynamics or climate models.

ADD REPLYlink written 6.3 years ago by Dr. Mabuse47k

simulations are usually computationally intensive, because you want to simulate as many factors as possible. For example, simulating the docking of a drug on a protein is very intensive, because you need to take into account a lot of factors (the position of each atom, its charge, etc..). Can you specify better which simulations are you referring to?

ADD REPLYlink written 7.5 years ago by Giovanni M Dall'Olio26k

If it wouldn't be shared memory then process will be very slow as hundreds of the process are running for a job and every time accessing it from cluster or stored memory is not a good idea !! It will eat lot of system resource for accessing the cluster. while shared memory would be off course faster.

ADD REPLYlink written 6.3 years ago by always_learning1000
0
gravatar for Dr. Mabuse
6.3 years ago by
Dr. Mabuse47k
Bergen, Norway
Dr. Mabuse47k wrote:

Let's consider an abstract simulation, a simulation consists of a state-space S (a gigantic matrix, |S|>>big number ) and a parallel algorithm A operating on S using m CPUs, e.g. each taking on one cell of S. Further, we assume that A is discrete/stepwise (per time interval of which there are t >> big number) and synchronized, meaning that it consists of discrete steps updating S and each of the m instances running A needs to have an up-to-date version of S to start its computation. Also, A leads to near 100% of S being updated in each step.

If nodes do not have shared memory, there is a large transfer penalty in the size of |S| per node over the network; even if every node only sends updated portions of S over the network, they still need to receive the new state-space in each step. As the simulation consists of a large number of steps, the total transfer volume is proportional to: |S| * m * t.

If one wants to improve this, each node could keep track on which other nodes need their updates and only send to those, the factor m could be reduced, however that is more complicated to implement and imposes restrictions on A.

In a shared memory environment, this penalty is virtually 0.

This gives some ideas about when it might be ok to use a cluster:

  • There is no state-space or S is very small (blast 1e10 fasta sequences, embarrassingly parallel)
  • S is guaranteed to be sparse (mostly 0)
  • The jobs do not need to be synchronized so often or not at all (only every n seconds)(real-time strategy games) or S can be interpolated if missed
  • Each step, only very few changes made to S
  • Interactions are local (each cell needs only knowledge of direct neighbors, e.g. 'game of life' like)
ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by Dr. Mabuse47k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 759 users visited in the last hour