Question: Need recommendation for a virtual machine or linux cluster for ngs analysis
0
gravatar for cmccabe
3.1 years ago by
cmccabe170
Chicago
cmccabe170 wrote:

I am in the process of designing an analysis pipeline for DNA-seq on human germline dna. Currently, we are using ion torrent and sequencing on a proton, for IDT's medical exome of 4500 genes. We are also looking into a nextseq, wes, and wgs. So my question will a virtual machine be sufficient or should I be looking more into linux clusters. We are a 400 bed hospital with growing ngs needs that are only increasing. My main concern with a VM is processing overhead and wanted to get some recommendations as to what others are doing. Thank you :).

forum ngs bioinformatics • 1.4k views
ADD COMMENTlink modified 3.1 years ago by Istvan Albert ♦♦ 78k • written 3.1 years ago by cmccabe170
2

Some of this has been already discussed here: http://seqanswers.com/forums/showthread.php?t=64332

ADD REPLYlink written 3.1 years ago by genomax59k

I don't think that a three year old thread is relevant especially when it comes to bioinformatics and IT

ADD REPLYlink written 3.1 years ago by Istvan Albert ♦♦ 78k

The thread I linked above was started by cmccabe on 17th November 2015. Not sure where you are seeing a three year old thread.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by genomax59k

sorry my bad, I misread the date

ADD REPLYlink written 3.1 years ago by Istvan Albert ♦♦ 78k

I do NGS on S. pombe which has 12.6 mbp genome. I started doing on a workstation because I thought it would be more convenient but later I learned to do cluster computing and oh that is way better than having my own VM, workstation etc. 

Consider you are working with human genome and I guess you are planning big projects for your hospital, so definitely cluster computer.

ADD REPLYlink written 3.1 years ago by Parham1.4k
1
gravatar for Istvan Albert
3.1 years ago by
Istvan Albert ♦♦ 78k
University Park, USA
Istvan Albert ♦♦ 78k wrote:

We have been successful with buying and using our own dedicated hardware -  for example one can get a 512GB RAM, 24 core, 100TB server for $25K or so. That can handle a surprising amount of work and it is a lot simpler to run and manage than a cluster. It is basically just one simple linux server and we are parallelizing via GNU Parallel. You should run some tests to see if that works for your needs.

Now the next step that I would recommend would not be to to building your own cluster - that in turn has massive costs in managing and maintaining it. Unless you already have the staff for it, though I doubt it, I imagine hospitals running windows. At that point, having staff supporting both linux and Windows based systems and integrate a diverse infrastructure gets very expensive. In that case I recommend to go with a service such as DNA Nexus where you don't need to bear the costs of running an infrastructure.

(Edit: corrected prices, specs)

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Istvan Albert ♦♦ 78k

That sounds like a smoking deal. Can you comment on where you got that config for $15K?

With a major manufacturer (including applicable discounts) a dual socket Xeon E-2698 v.3, 512G system is crossing $15K mark before adding any storage.

Disclaimer: There are probably tens of ways to configure a server with these specs. I just picked one type to see what the pricing would be like. Those intel CPU's are over $3K each.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by genomax59k

Let me correct the prices/specs. Looks like mixed together two different quotes/specs for servers that we purchased. I looked it up since as you point it out 15K is really too low. Our final cost was $24K (still a very good deal IMHO). Final specs were Dual Intel Xeon E5-2680 v3 CPU (12C, 2.5GHz), so it is 24 cores, 144TB configured to 100TB with RAID. Also came with two 128GB SSD drives.

We got it from http://www.penguincomputing.com/

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Istvan Albert ♦♦ 78k

So here are the most recent specs for an HPC (microHPC^2) from advanced clustering technologies:

256GB
4TB SATA HDD
2x 120 GB SSD mirrored for ubuntu and software
2x 400GB Data Center PCIe SSD for rapid file transfer, download, and processing
2x 14 core xeon e5 2.6Ghz 35MB cache

The cost looks to be ~12.5k and GNU parellel will be utilized.  Thoughts?  Thank you :).

ADD REPLYlink written 3.0 years ago by cmccabe170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 921 users visited in the last hour