Question: Clinical MiSeq and Compute Systems
gravatar for DG
5.2 years ago by
DG7.1k wrote:

Hi Everyone,

Over a year ago I asked this question: Constructing Compute Resources In Support Of Ngs In A Hospital Setting about NGS and compute requirements people were thinking about at the time. A lot has changed in the year and a half since I visited the topic, and in particular NGS in the clinic is something that is far more common. So I thought it worth revisiting the question in general, but with a new question. 


Thoughts and opinions, particularly people with some knowledge of NGS in health care would be most welcome. I'll preface it by saying that I just got this new job, so starting next week I will officially be the Clinical Bioinformatician for part of the healthcare system here in Canada. The DNA diagnostics lab has purchased two MiSeqs (so they have redundancy) and I'll be in charge of compute, data handling, analysis, etc. Most of what we hear about in terms of data storage/compute in relation to NGS is really geared towards larger sequencing centers (on the large end) or individual labs on the small end. While the number of samples you'll run has a huge impact on the amount of data that will be generated, I am guessing there are some people out there with some more direct experience.


Cloud solutions are a no go here in Canada with our Healthcare system so everything must be on site. Budget also isn't large so it is a trade off between storage size, redundancy, and compute speeds/capacity. Ultimately I think archiving will need to be handled through magnetic tape or archival optical disk (or this new M-Disc I have seen recently), because I don't think we will be in a position to support archival storage on hard disk if we are obligated to keep records and tests for the same length of time as a Pathology report (20-25 years). 


So what are some innovative solutions you would consider? Especially to keep costs as low as possible? If you are currently supporting a MiSeq in the clinic what are you doing?


Hopefully we can have some good discussion.

clinical miseq next-gen • 1.5k views
ADD COMMENTlink modified 5.2 years ago by coldrecd80 • written 5.2 years ago by DG7.1k

One thing you should look at for storage is NGS alternatives to gzip/bzip. I don't have the numbers, but I did some testing on non-reference methods a while ago and they offer better compression ratios than gzip/bzip. Supposedly, you can get even better compression ratios using algorithms which use a reference genome.

ADD REPLYlink written 5.2 years ago by h.mon30k
gravatar for coldrecd
5.2 years ago by
United States/Nashville
coldrecd80 wrote:

Dan, I've been in a situation similar to the one you describe for a couple of years, and this is how we handle it:

1) We have two MiSeqs, and we have never regretted the redundancy. We started with 2 runs (2x150bp, v2 chem) per week but we are up to about 5 now, so they often run in parallel. We use on-board MSR for alignment and variant calling, supplemented by custom post-processing.

2) The sequencers are supported by virtualized servers in our EMC(I think?) stack. I run 24 processor CentOS and Windows VMs, and have about 5 TB of SAN storage. This is more than enough for post-processing and re-running MSR when needed.

3) I minimize the run folders, keeping only BCL and BAM/VCF as well as run metadata. We need to run tape archives every 4-6 months, sending ~1 TB offsite.

It is extremely helpful for me to use our institutional infrastructure in this way. I don't need to worry about short-term redundancy (SAN is great) or long term storage (tape is cheap, and it is destroyed on schedule). VMs are cheap, and I can validate a new configuration and migrate without pain. I'll never look back.


Good luck with your new job! Make friends with whomever runs your virtualization environment!



ADD COMMENTlink written 5.2 years ago by coldrecd80

Thanks Chris,


That is helpful. Right now it looks like the hospital IT doesn't want to touch the infrastructure for supporting NGS (good and bad), meaning it will mostly be up to me to decide on the set up and maintenance. The IT people don't feel confident in doing HPC/Big Data like tasks. But I will have to work with them on various aspects of course.


Your set up seems even smaller than what I was envisioning, so that is good to know. Ideally I would like to have 100 TB or so of near line storage as well as archiving run data for cold storage on tape.

ADD REPLYlink written 5.2 years ago by DG7.1k

We've done 330 MiSeq runs since we started, plus many PGM and Sanger runs. We've archived nowhere near 100 TB, and I consider myself to be cautious.

Even if your group runs the servers, consider virtualization. On a properly configured system your performance hit is negligible and the flexibility you gain is tremendous.



ADD REPLYlink written 5.2 years ago by coldrecd80

Mostly its about future proofing and taking advantage of some available funds that might not be available down the road.. Much harder to expand later and plan for research use as well. I'm definitely considering virtualization in terms of the compute nodes. I'm also very interested in containerization using Docker and some of the technologies that have come from cloud computing. I want both the storage and compute to be as flexible as possible.

ADD REPLYlink written 5.2 years ago by DG7.1k
gravatar for Daniel Swan
5.2 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

So in the UK, the healthcare system primarily runs on Windows. And behind the NHS firewall. This means that in a clinical setting you will be dealing with doing analysis a) on Windows b) not in the cloud (although this may change) and c) without external network connectivity. In a previous post one solution we used to get our pipelines into healthcare labs was to run them as a Linux VM on a Windows desktop. And yes we could happily process a lane of MiSeq data for targeted panels in on a VM in reasonable time.

Obviously we didn't have to worry about the storage of the data afterwards, just processing, but I'm entirely sure this is a solved issue in clinical settings ;)

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Daniel Swan13k

Our hospitals support Linux as well as Windows, and most requirements are just in order to be on the network. An isolated compute cluster could theoretically run anything here.

ADD REPLYlink written 5.2 years ago by DG7.1k

Then count yourself lucky ;)


ADD REPLYlink written 5.2 years ago by Daniel Swan13k

Of course I also have to maintain it as IT doesn't want to, so that's a challenge. Hence the desire to keep it as simple but flexible as possible.

ADD REPLYlink written 5.2 years ago by DG7.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1777 users visited in the last hour