Question: Best Hardware Solution For Medium-Size Bioinformatics Lab [15-20 Computers]
gravatar for Leszek
8.0 years ago by
IIMCB, Poland
Leszek4.0k wrote:

I'm involved in establishing a new bioinformatics lab. It will serve primarily for teaching purporses, but I think it is possible to design small grid as well. The idea is to have 15-20 desktop computers, 4-8 cores each, and build grid using half of the cores. The budget is ~30k Euros. Probably, some network file system (NFS) will be needed as well. Can you share yours experience in that matter? Or maybe you have some suggestions of ready solutions? Or is it better to buy cheaper desktops and invest saved money into 2-3 very strong workstations?

What I experienced so far in Bioinformatics labs, can be divided into 2 solutions:

  • all data stored in network file system (NFS) and computer clients loading everything from NFS. Independent cluster uses the same NFS. It has advantage of easy back-ups, and unified system. But the users have limited privileges and have to ask for every piece of software. Unfortunately, in case of troubles with NFS, no one can work at all:/
  • the client computers store data locally, but there is NFS, mainly for cluster and back-up purposes

    I'm awaiting yours comments.

  • hardware • 7.6k views
    ADD COMMENTlink modified 7.5 years ago by Istvan Albert ♦♦ 81k • written 8.0 years ago by Leszek4.0k

    I think you should buy a dedicated server with RAID and then buy for each lab member a cheap desktop.

    ADD REPLYlink written 8.0 years ago by lh331k

    As you say "15-20 desktop computers", I imagine you have a lab of 15 members and you are considering each user has one desktop and shares part of the resource for the cluster. I do not have much experience, but I think you should consider a dedicated server (with RAID) and buy for each lab member a cheap desktop.

    ADD REPLYlink written 8.0 years ago by lh331k

    I like lh3's recommendation of a dedicated server with RAID and a cheap desktop; this will lead to some per-desktop management but I'd expect it to be the solution with the least surprises in terms of maintenance effort, etc.

    ADD REPLYlink written 8.0 years ago by Gareth Palidwor1.6k

    What is the goal of the desktops? What kind of applications will you be training them on?

    ADD REPLYlink written 7.5 years ago by Mndoci1.2k
    gravatar for Darked89
    8.0 years ago by
    Barcelona, Spain
    Darked894.2k wrote:

    Populating 15+ separate desktops with a number of bioinfo packages does not seem right to me. While it is doable, you would need something like cfengine or Chef to automate everything.

    You have many options when it comes to exporting particular directories from NFS server, from everything (remote booting of clients) through /biosoft and /home dirs, or just /biosoft. So some work can be done during NFS failure assuming you got the basic tools in your /home dir.

    Given a choice I would go with dumb and cheap clients (but with enough RAM and swap & /tmp) and few servers with a lot of RAM / cores. Some stuff (i.e. genome/transcriptome assembly) hardly works, if at all, on low RAM machines.

    If you are concerned about NFS failures then you may go with RAID and mirrored NFS servers.

    ADD COMMENTlink written 8.0 years ago by Darked894.2k
    gravatar for Gareth Palidwor
    8.0 years ago by
    Gareth Palidwor1.6k
    Gareth Palidwor1.6k wrote:

    Mixing the desktops and cluster doesn't sound like a good idea. I'd expect it to create a lot of admin issues especially in a high use teaching lab.

    I'd recommend minimal systems for desktops, PXE boot from the network, and create a cluster separately. This should be a low admin solution as a reboot of the desktop will load whatever the new OS changes are. If you're using, for example, gridengine, make the desktops clients and people can qlogin or qsh to the cluster for interactive sessions, also have their home dirs and data dirs automounted locally when they login. As you say, if there are NFS issues everything goes down, but that's an issue anyway in a networked environment. A mid-way alternative is something like the Rocks distro which (if I remember correctly, haven't played with it in a while) will update a local OS install on boot.

    In terms of installs, if it's a teaching lab you really don't want local installs of software. What we do on our cluster is have a /data/binaries directory where the shared binaries are installed, that works fine.

    If you want to go very cheap on the cluster hardware, a friend who has assembled many machines has a successful and simple heuristic for choosing good cheap hardware; go to to Tom's Hardware (or other site), for each component you need, choose the cheapest of the top 5 or 10. This gives a good price / performance ratio. Given the budget your cluster will not benefit much from rack mounting as there will be only a few compute nodes, and you pay a premium for rack hardware vs beige boxes.

    Note that cheap hardware can be noisy, particularly under high load as the fans really spin up; this may be a consideration if the cluster nodes are being housed in the same room as the desktops.

    You'll probably want a local switch for the cluster and desktops given the number of nodes, 32 ports at least (desktops and cluster interconnects). You may want to talk to your institute IT about the interconnects between your network and theirs.

    NAS are cheap for storage, and a system like OpenFiler with commodity hardware can get you 24-48T of RAIDable storage at a reasonable cost. If you just buy a commodity box and configure it yourself, setup and maintenance may be expensive time-wise depending on your level of knowledge. I've had good experiences with the more expensive pre-configured appliances (ReadyNAS, and others) and though the performance isn't great, they're very reliable and almost zero maintenance.

    I would strongly recommend having some sort of backup mechanism; RAID is not backup. Either tape or another NAS holding backups for key stuff.

    ADD COMMENTlink modified 8.0 years ago • written 8.0 years ago by Gareth Palidwor1.6k

    +1 for "RAID is not a backup". Physical separation and copies are a must for important data.

    ADD REPLYlink written 8.0 years ago by Sean Davis25k

    +1000000 for "RAID is not backup". The only reliable form of backup is physical replication and storage across physical facilities (something like tape works but ideally not in the same facility)

    ADD REPLYlink written 7.5 years ago by Mndoci1.2k
    gravatar for Istvan Albert
    8.0 years ago by
    Istvan Albert ♦♦ 81k
    University Park, USA
    Istvan Albert ♦♦ 81k wrote:

    I would recommend Jeremy's BASS approach described here:

    in a nutshell: one powerful central server, cheap clients connecting to it.

    ADD COMMENTlink written 8.0 years ago by Istvan Albert ♦♦ 81k

    The one challenge with this. Everything breaks all the time and one big server is a single point of failure (it's why I hate vertical scaling. The more expensive the box, the less happy you are when it goes down). Maybe not a problem in this scenario, but something to remember.

    ADD REPLYlink written 7.5 years ago by Mndoci1.2k

    i can't think of anything that breaks more than the head node on our cluster

    ADD REPLYlink written 7.4 years ago by Jeremy Leipzig18k

    Yes, and you should expect it to. The point is if it's a cheaper box, you just fail away, cause it's easier, and more manageable to have a failover scenario. If it's an expensive box, you are less likely to have a failover scenario. And what happens when the expensive box becomes too small (let's say you have 4 TB of data that doesn't quite fit into memory). Buying a second one is prohibitive. Admittedly, a cluster requires better software, but that's a cheaper, and more robust, long term option.

    ADD REPLYlink modified 7.4 years ago • written 7.4 years ago by Mndoci1.2k
    gravatar for Fabian Bull
    8.0 years ago by
    Fabian Bull1.3k
    Fabian Bull1.3k wrote:

    I have no experience with setting up IT-infrastructure but I can descibe the infrastructure I am working with at a institute.

    We have seperate high-performance server and a cluster all mounted with a huge nfs network storage. There are backups on a hourly basis. So everybody can share files and backup his stuff by himself.

    Its a great working experience.

    ADD COMMENTlink written 8.0 years ago by Fabian Bull1.3k
    Please log in to add an answer.


    Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
    Powered by Biostar version 2.3.0
    Traffic: 1066 users visited in the last hour