Hi, We are establishing a bioinformatics core in our institution. The idea is to start with performing 1500 to 2000 whole exome sequencing a year, but planning to run other services in future (Gene regulation, miRNA regulation, Genome variation, etc. ) the budget is not an issue , My Question is where to start with respect to equipment(hardware) , and staff. Thank you - Sara
I think the hardware side of things is easy enough (find out the vendor your IT folks have been working with and spec out a small cluster with them, make sure that you include a backup method of some sort). Regarding personnel, you would at the very least need one staff-scientist level person to oversee things like "are the sequence runs OK?", "let's build a pipeline", "are the results of the pipeline reasonable given the biological questions being posed?" (I'm assuming you'll be doing the full analysis rather than kicking BAM files down to the wet lab folks).
The most important part of all of this is something you didn't mention and that's what the expectations are of you/the core. This needs to be spelled out very early on and very clearly. What typically happens is that a core is set up with goal X in mind and 6 months later you're working on X plus A->Q. This is, to me at least, the most important thing to clear up with all of the stakeholders before you start putting things out for bid or placing job ads (btw, you can do that here).
My two cents.. since I lived this experience in my University
If you invest in a huge computer facility, give for sure that you will spend a lot of money (and I mean a lot) and can expect that the computers will become obsolete after a few years. Not to mention the efforts to maintain that service.
In the other side.
Time for huge changes in the NGS world is coming in the short or medium range. We will be using a new generation of sequencers and/or utilities that will require fewer resources. One example is the use of long read sequencers (pacbio, nanopore and the like), or programs like Kallisto that run an alignment in minutes using 1 or 2 Gb of RAM only.
You also need to consider to hire the efforts of a system maintainer
In our case, we put all these things in a balance, and we took the decision of not to spend such a huge amount of money. We are using computers facilities like Amazon EC (you pay for what you use) or supercomputers around us. Amazon EC maintains their own computers, and this is a labor you avoid
I think a small cluster would start at 10 x 16 core compute nodes, plus a head node, each with 256GB of RAM. Importantly, don't skimp on the storage, particularly if most of the work you are planning to do is exomes, which are particularly space requiring. Make sure you get something that doesn't get slower as it gets busier, so something like Isilon. Connect it all together with at least Gigabit ethernet. Cloud is definitely a possibility, but watch out for the data transfer costs, which more than doubled the quote last time I costed a grant on the cloud.
To run all that you'll need a sys admin. In addition employ at least one high grade, properly experienced bioinformatician (minimum grade is at least the career grade for a research division leader). Other employees depends on what you want from the core. If you just want "you provide sequence, I provide lists of SNPs", then you'll probably be okay with masters level people. However, my experience is that most folks want help interpreting the data as much as analysing it. In this case I'd argue for hiring a bunch of postdoc level people, and basically hawking them out as rent-a-postdocs - spending 30 or 50% of their time on a project for a collaborator over the period of six months to a year for each project. Either pay them well, and offer job security, or offer them a slice of their time to work on projects of their own choosing - you will need to do something to stop the ones that are anygood leaving for jobs with more freedom at the first opportunity.
You either are creating a sequencing core lab (that may do needed bioinformatics on the side) or you should consider creating two separate cores. In latter case, one just does sequencing and the other bioinformatics. That way both would be free to follow other opportunities, since 2000 exomes a year will not keep either core completely busy.
Edit: Re-reading your original post it sounds like you are only setting up a bioinformatics core (i.e.sequencing may be done elsewhere). So above may not apply. I will leave the answer here in case both aspects apply.