Need recommendation for a virtual machine or linux cluster for ngs analysis
1
0
Entering edit mode
8.4 years ago
bioguy24 ▴ 230

I am in the process of designing an analysis pipeline for DNA-seq on human germline dna. Currently, we are using ion torrent and sequencing on a proton, for IDT's medical exome of 4500 genes. We are also looking into a nextseq, wes, and wgs. So my question will a virtual machine be sufficient or should I be looking more into linux clusters. We are a 400 bed hospital with growing ngs needs that are only increasing. My main concern with a VM is processing overhead and wanted to get some recommendations as to what others are doing. Thank you :).

ngs • 3.4k views
ADD COMMENT
2
Entering edit mode

Some of this has been already discussed here: http://seqanswers.com/forums/showthread.php?t=64332

ADD REPLY
0
Entering edit mode

I don't think that a three year old thread is relevant especially when it comes to bioinformatics and IT

ADD REPLY
0
Entering edit mode

The thread I linked above was started by cmccabe on 17th November 2015. Not sure where you are seeing a three year old thread.

ADD REPLY
0
Entering edit mode

sorry my bad, I misread the date

ADD REPLY
0
Entering edit mode

I do NGS on S. pombe which has 12.6 mbp genome. I started doing on a workstation because I thought it would be more convenient but later I learned to do cluster computing and oh that is way better than having my own VM, workstation etc.

Consider you are working with human genome and I guess you are planning big projects for your hospital, so definitely cluster computer.

ADD REPLY
1
Entering edit mode
8.4 years ago

We have been successful with buying and using our own dedicated hardware - for example one can get a 512GB RAM, 24 core, 100TB server for $25K or so. That can handle a surprising amount of work and it is a lot simpler to run and manage than a cluster. It is basically just one simple linux server and we are parallelizing via GNU Parallel. You should run some tests to see if that works for your needs.

Now the next step that I would recommend would not be to to building your own cluster - that in turn has massive costs in managing and maintaining it. Unless you already have the staff for it, though I doubt it, I imagine hospitals running windows. At that point, having staff supporting both linux and Windows based systems and integrate a diverse infrastructure gets very expensive. In that case I recommend to go with a service such as DNA Nexus where you don't need to bear the costs of running an infrastructure.

(Edit: corrected prices, specs)

ADD COMMENT
0
Entering edit mode

That sounds like a smoking deal. Can you comment on where you got that config for $15K?

With a major manufacturer (including applicable discounts) a dual socket Xeon E-2698 v.3, 512G system is crossing $15K mark before adding any storage.

Disclaimer: There are probably tens of ways to configure a server with these specs. I just picked one type to see what the pricing would be like. Those intel CPU's are over $3K each.

ADD REPLY
0
Entering edit mode

Let me correct the prices/specs. Looks like mixed together two different quotes/specs for servers that we purchased. I looked it up since as you point it out 15K is really too low. Our final cost was $24K (still a very good deal IMHO). Final specs were Dual Intel Xeon E5-2680 v3 CPU (12C, 2.5GHz), so it is 24 cores, 144TB configured to 100TB with RAID. Also came with two 128GB SSD drives.

We got it from http://www.penguincomputing.com/

ADD REPLY
0
Entering edit mode

So here are the most recent specs for an HPC (microHPC^2) from advanced clustering technologies:

  • 256GB
  • 4TB SATA HDD
  • 2x 120 GB SSD mirrored for ubuntu and software
  • 2x 400GB Data Center PCIe SSD for rapid file transfer, download, and processing
  • 2x 14 core xeon e5 2.6Ghz 35MB cache

The cost looks to be ~12.5k and GNU parellel will be utilized. Thoughts? Thank you :).

ADD REPLY

Login before adding your answer.

Traffic: 2139 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6