Hello,
I am a new PI working in the area of computational genomics, who is looking at setting up a compute infrastructure for my lab. There are two main computational tasks that the lab will be performing: 1. simulations (100-1000s of jobs each with a run time of a few seconds up to a few minutes) and 2. genome alignments, SNP calling, etc (only a few jobs but with higher RAM requirements). As such, I am looking into two different options: one system with a large amount of RAM but few CPUs and one with many CPUs with less required RAM or alternatively a solution where RAM can be temporarily shared (ideally with a RAID5 or RAID6). I would greatly appreciate if someone could share their experience with different compute architectures with me (as well as which companies they can recommend).
Thanks!
Make sure to have sufficient I/O capacity to really make full use of CPU and RAM. The best cluster makes no sense if the I/O bottleneck kills all the performance and permits to use multithreading effectively.
I have no experience with this, but if you have a hard time estimating your needs you could also look at more flexible cloud-based solutions for which you pay what you use/need when you need it. Perhaps others have a different opinion about this.
From your post it seems that you may be conflating RAM with storage space.
RAM cannot be shared via a RAID - this latter word stands for "redundant array of independent disks" so they are hard drive storage systems no computer memory.
As genomax states get as much RAM as possible hundreds of GB if possible.