Hardware Recommendations: 1 Terabyte of RAM?
3
1
Entering edit mode
8.8 years ago
mossmatters ▴ 90

We have a good bit of funding available to purchase a high performance computer, and we are wondering if others have experience setting up and running machines with up to 1 TB of RAM. We are interested in using the computer for de novo genome assembly of plants, and expect the genomes to be highly repetitive. The MaSuRCA assembler, for example, suggests this 1 TB requirement for large plant genomes: http://www.genome.umd.edu/docs/MaSuRCA_QuickStartGuide.pdf

Our previous computer purchase was a small computing cluster from PSSC Labs, but we have struggled to assemble genomes (and even transcriptomes) using programs like SOAPdenovo, Ray, and Trinity, due to the high I/O requirements. We also have a smaller, single-node machine from PSSC that runs well, but has "only" 256 GB of RAM.

Has anyone had experience building/purchasing HPCs with these requirements? Do you have any advice?

hardware genome-assembly • 3.8k views
ADD COMMENT
2
Entering edit mode
8.8 years ago

I have access to a huge supercomputer with 1 Tb of RAM memory, and I have assembled a 2,2Gb genome and its corresponding transcriptome, and I never have required such as high RAM, but I was working alone with it. No other users were using the computer at the same time

I believe a high amount of RAM memory will be required if your computer will be used for several users at the same time

ADD COMMENT
2
Entering edit mode
8.8 years ago

For 1TB RAM, you'll probably want a 4-socket motherboard as there's a limit to the amount of RAM you can add per socket; each socket can only address so many modules, and modules have a maximum capacity. This maximum capacity varies a lot and gradually increases but there's a sweet spot of optimal price, and that's below the maximum capacity. Our 1TB nodes are all 4-socket.

4-socket-capable CPUs are really expensive. And the cost increases with the core count and frequency. You will basically get linear increase in throughput with more cores and higher frequency for applications like mapping, but not for assembly of huge genomes on a NUMA machine which is more likely to be limited by memory random-access time, so don't buy the top CPU models. Also, assemblers are harder in general to scale with more CPUs even with uniform memory (and frequency is more important than core-count for applications that do not parallelize well). I have not use Masurca, but Spades, for example, does not really use more than about 4 CPUs on average, no matter how many you have - much of the time is spent in single-threaded code sections. This varies a lot between assemblers. Hyperthreading is good, though, so be sure to enable that; it helps hide memory latency, in programs that are efficiently multithreaded.

ADD COMMENT
0
Entering edit mode
8.8 years ago
h.mon 35k

I have some experience using an SGI UV-2000, which scales to up to 64Tb memory and 256 cpu sockets (I believe up to 2048 cores). The computer I use is far from its top configuration, right now it has 768Gb memory. It is a modular system, you buy and add one or more "U" to the system, and a proprietary SGI layer hides this from the OS, which sees only one gigantic computer.

It uses SUSE 11 Enterprise, which is a fairly stable but old OS and utilities. SUSE 11 repository does not have any (or very few) bioinfo tools, and the devel tools are fairly old, and its also subscription-based to get updates. One has to compile and install a lot of the regular tools (GCC, cmake, Maven, etc), while on other systems you just do an "apt-get install" or "yum install", only to be able to compile the bioinformatics packages later on. It is not extremely difficult, but you may trip often and it may take some time to solve easy problems (my background is biology, no formal education in computer science nor systems administration). And to be able to efficiently use such a system, it is good to install a queue scheduler and job manager to run the analyses, specially the expensive ones.

I didn't take part on buying it, so I don't know how it compares price-wise to other solutions.

ADD COMMENT
0
Entering edit mode

My god that's a cool machine!

ADD REPLY
0
Entering edit mode
I'm curious: could one run docker on an SGI UV?
ADD REPLY
0
Entering edit mode

I have zilch experience with Docker, so I can't answer for sure.

I found this for SUSE 12, but nothing for SUSE 11. There is, however, a good deal of documentation about LXC on SUSE 11. Also no good hits for SGI and Docker (admittedly, a very cursory search). I will try (not soon, though) to install something from list of dockerized bioinformatics apps list and report back.

ADD REPLY
0
Entering edit mode
I doubt it will work b/c with SGI UV you probably have some fancy (older) kernel not supported by docker. But if it did it could solve all your installation problems by for example running an Ubuntu container.
ADD REPLY
0
Entering edit mode

Yep, seems you are right. From Docker docs:

A 3.10 Linux kernel is the minimum requirement for Docker.

Anyway, my installation problems are my current hobby, so either way work for me.

ADD REPLY

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6