Our lab is currently looking at purchasing a new large-memory system that will be primarily used for running ALLPATHS-LG and other assemblers (Trinity, etc) and performing related genomic analysis. We regularly work with very large (3+ Gbp), complex polyploid genomes and in the past have required in excess of 512GB of memory for individual ALLPATHS-LG and Trinity assemblies. We're currently aiming for a 1TB machine roughly in the $30-35K range, and have most recently been looking at the Dell PowerEdge R820 line. We'd like to be able to perform complex assemblies as quickly as possible, and have a machine that will serve us well for at least the next five years. We already have the necessary infrastructure in place for backups, etc and are interested primarily in the machine itself.
I'd appreciate any and all recommendations or experiences that researchers in the community have had running complex assemblies with different hardware setups. In particular, I'm interested in what kind of performance increases would be gained by adding more cores (from 8 to around 20 - how well does ALLPATHS-LG scale?) and by using flash storage (given the large number of temporary files output during many assemblies). Are there any particular hardware features or setups that we should be looking into? Also, what sort of novel hardware needs might emerge in the next half-decade?