Question: Hardware Suitable For Generic Nextgen Sequencing Processing?
gravatar for Geoffjentry
9.2 years ago by
Geoffjentry320 wrote:

Hello. My lab finds itself in a difficult situation as we are currently looking to acquire new hardware for the next year or two, which is something that must be done in a short time frame for budgetary reasons. The difficult part is that we are currently exploring multiple nextgen sequencing technologies as there might be a need to replace the system that we're currently using. The data for our current platform is embarrassingly parallel in its processing and would work well for a blade/cluster type situation, although I have no idea what optimal hardware for processing other nextgen data would be (e.g. illumina, pacific bioscience). Due to the potential change of platforms, I'm leery to simply base new hardware on the needs of the current platform, in case some of the assumptions no longer applies.

For people that do nextgen sequencing, what sort of hardware solutions are you using? Clusters? Large servers? Would love suggestions on specific brand/models as well.

I should also mention that for reasons out of my control (powers that be, and all that), CUDA and the like aren't an option for us.

next-gen hardware sequencing • 5.2k views
ADD COMMENTlink modified 2.7 years ago by brianbonhamjohnson0 • written 9.2 years ago by Geoffjentry320

Some recommendations on this question: Any Hardware Recommendations For A Molecular Biology Lab That'S Getting Into Bioinformatics?

ADD REPLYlink modified 5.7 years ago by Steve Moss2.3k • written 9.2 years ago by Simon Cockell7.3k

Can you provide a little more info? I ask because the answers to this question will largely depend on what you're doing with that sequence data. De novo genome assembly programs typically require huge amounts of RAM (really, the more the better). Modern algorithms for mapping reads, though, need CPU but not a lot of RAM. The final question is: how busy will these CPUs be? If they'll be idle 75% of the time or more, you might look into EC2 or other cloud-computing options.

ADD REPLYlink written 9.2 years ago by Chris Miller21k

@chris: I wish I could. On the "what", it's really a mixed bag ranging from digital gene expression to assembly. It's currently mapping reads however, thus the clusterability. Currently we're doing the processing on a HP linux machine w/ 16 CPU cores and 96GB of RAM, and a bulk of the processes take 4-8GB of RAM and as much CPU as they can get. The largest problem we have w/ the current platform's software is that they make use of SQLite DBs and will quickly flood the machine's IO limitation if we have many processes running.

ADD REPLYlink written 9.2 years ago by Geoffjentry320

A quick note: you will most likely also need substantial system administration expertise as well; this is particularly true when investing into cluster computing type of solutions.

ADD REPLYlink written 9.2 years ago by Istvan Albert ♦♦ 81k

@chris part 2: I've looked at EC2. It's not really an option for the same reason CUDA isn't - the people signing the checks don't like that idea.

@Istvan: Sysadmin isn't a big deal. We already do most of the management of our servers ourselves, and have "real" sysadmins behind that if anything happens.

ADD REPLYlink written 9.2 years ago by Geoffjentry320

@Simon: I did see that thread, thanks though!

ADD REPLYlink written 9.2 years ago by Geoffjentry320
gravatar for Chris Miller
9.2 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

Okay, well then I'll go ahead and throw some info out there in the hopes that it's useful to you.

What I can tell you is that the cluster we share time on has 8-core machines with 16GB of RAM each and they're sufficient for most of our needs. We don't do much assembly, but we do do a ton of other genomic processing, ranging from mapping short reads all the way up to snp calling and pathway inference. I also still do a fair amount of array processing.

Using most cluster management tools, (PBS, LSF, whatever), it should be possible to allow a user to reserve more than one CPU per node, effectively giving them up to 16 GB for a process if they reserve the whole node. Yeah, that means some lost cycles, but I don't seem to use it that often - 2GB is still sufficient for most things I run. It'd also be good to set up a handful of machines with a whole lot of RAM - maybe 64GB? That gives users who are doing things like assembly or loading huge networks into RAM some options.

I more often run into limits on I/O. Giving each machine a reasonably sized scratch disc and encouraging your users to make smart use of it is a good idea. Network filesystems can be bogged down really quickly when a few dozen nodes are all reading and writing data. If you're going to be doing lots of really I/O intensive stuff (and dealing with short reads, you probably will be), it's probably worth looking into faster hard drives. Certainly 7200RPM, if not 10k. Last time I looked 15k drives were available, but not worth it in terms of price/performance. That may have changed.

I won't get into super-detail on the specs - you'll have to price that out and see where the sweet spot is. I also won't tell you how many nodes to get, because again, that depends on your funding. I will say that if you're talking a small cluster for a small lab, it may make sense to just get 3 or 4 machines with 32 cores and a bunch of RAM, and not worry about trying to set up a shared filesystem, queue, etc - it really can be a headache to maintain. If you'll be supporting a larger userbase, though, then you may find a better price point at less CPUs per node, and have potentially fewer problems with disk I/O (because you'll have less CPUs per HD).

People who know more about cluster maintenance and hardware than I do, feel free to chime in with additions or corrections.

ADD COMMENTlink modified 9.2 years ago • written 9.2 years ago by Chris Miller21k

Given some other up in the air type of issues, I think we're going back to our old plan of "big ass machine" and let it be somewhat flexible. At least now I have another opinion supporting that belief.

ADD REPLYlink written 9.2 years ago by Geoffjentry320
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2015 users visited in the last hour