Question: Hardware Recommendations: 1 Terabyte of RAM?
1
gravatar for mossmatters
2.5 years ago by
mossmatters90
Chicago Botanic Garden, Glencoe, IL
mossmatters90 wrote:

We have a good bit of funding available to purchase a high performance computer, and we are wondering if others have experience setting up and running machines with up to 1 TB of RAM. We are interested in using the computer for de novo genome assembly of plants, and expect the genomes to be highly repetitive. The MaSuRCA assembler, for example, suggests this 1 TB requirement for large plant genomes: http://www.genome.umd.edu/docs/MaSuRCA_QuickStartGuide.pdf  

Our previous computer purchase was a small computing cluster from PSSC Labs, but we have struggled to assemble genomes (and even transcriptomes) using programs like SOAPdenovo, Ray, and Trinity, due to the high I/O requirements. We also have a smaller, single-node machine from PSSC that runs well, but has "only" 256 GB of RAM.

Has anyone had experience building/purchasing HPCs with these requirements? Do you have any advice?

genome assembly hardware • 1.6k views
ADD COMMENTlink modified 2.5 years ago by h.mon9.8k • written 2.5 years ago by mossmatters90
2
gravatar for Antonio R. Franco
2.5 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco3.4k wrote:

I have access to a huge supercomputer with 1 Tb of RAM memory, and I have assembled a 2,2Gb genome and its corresponding transcriptome, and I never have required such as high RAM, but I was working alone with it. No other users were using the computer at the same time

I believe a high amount of RAM memory will be required if your computer will be used for several users at the same time

ADD COMMENTlink written 2.5 years ago by Antonio R. Franco3.4k
2
gravatar for Brian Bushnell
2.5 years ago by
Walnut Creek, USA
Brian Bushnell15k wrote:

For 1TB RAM, you'll probably want a 4-socket motherboard as there's a limit to the amount of RAM you can add per socket; each socket can only address so many modules, and modules have a maximum capacity.  This maximum capacity varies a lot and gradually increases but there's a sweet spot of optimal price, and that's below the maximum capacity.  Our 1TB nodes are all 4-socket.

4-socket-capable CPUs are really expensive.  And the cost increases with the core count and frequency.  You will basically get linear increase in throughput with more cores and higher frequency for applications like mapping, but not for assembly of huge genomes on a NUMA machine which is more likely to be limited by memory random-access time, so don't buy the top CPU models.  Also, assemblers are harder in general to scale with more CPUs even with uniform memory (and frequency is more important than core-count for applications that do not parallelize well).  I have not use Masurca, but Spades, for example, does not really use more than about 4 CPUs on average, no matter how many you have - much of the time is spent in singlethreaded code sections.  This varies a lot between assemblers.  Hyperthreading is good, though, so be sure to enable that; it helps hide memory latency, in programs that are efficiently multithreaded.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Brian Bushnell15k
0
gravatar for h.mon
2.5 years ago by
h.mon9.8k
Brazil
h.mon9.8k wrote:

I have some experience using an SGI UV-2000, which scales to up to 64Tb memory and 256 cpu sockets (I believe up to 2048 cores). The computer I use is far from its top configuration, right now it has 768Gb memory. It is a modular system, you buy and add one or more "U" to the system, and a proprietary SGI layer hides this from the OS, which sees only one gigantic computer.

It uses SUSE 11 Enterprise, which is a fairly stable but old OS and utilities. SUSE 11 repository does not have any (or very few) bioinfo tools, and the devel tools are fairly old, and its also subscription-based to get updates. One has to compile and install a lot of the regular tools (GCC, cmake, Maven, etc), while on other systems you just do an "apt-get install" or "yum install", only to be able to compile the bionformatics packages later on. It is not extremely difficult, but you may trip often and it may take some time to solve easy problems (my background is biology, no formal education in computer science nor systems administration). And to be able to efficiently use such a system, it is good to install a queue scheduler and job manager to run the analyses, specially the expensive ones. 

I didn't take part on buying it, so I don't know how it compares price-wise to other solutions.

ADD COMMENTlink written 2.5 years ago by h.mon9.8k

My god that's a cool machine!

ADD REPLYlink written 2.5 years ago by Lynxoid210
I'm curious: could one run docker on an SGI UV?
ADD REPLYlink written 2.5 years ago by Christian2.6k

I have zilch experience with Docker, so I can't answer for sure.

I found this for SUSE 12, but nothing for SUSE 11. There is, however, a good deal of documentation about LXC on SUSE 11. Also no good hits for SGI and Docker (admittedly, a very cursory search). I will try (not soon, though) to install something from list of dockerized bioinformatics apps list and report back.

ADD REPLYlink written 2.5 years ago by h.mon9.8k
I doubt it will work b/c with SGI UV you probably have some fancy (older) kernel not supported by docker. But if it did it could solve all your installation problems by for example running an Ubuntu container.
ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Christian2.6k

Yep, seems you are right. From Docker docs:

A 3.10 Linux kernel is the minimum requirement for Docker.

Anyway, my installation problems are my current hobby, so either way work for me.

ADD REPLYlink written 2.4 years ago by h.mon9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1265 users visited in the last hour