Establishing a Core Bioinformatics Facility
5
2
Entering edit mode
6.4 years ago
Sara ▴ 20

Hi, We are establishing a bioinformatics core in our institution. The idea is to start with performing 1500 to 2000 whole exome sequencing a year, but planning to run other services in future (Gene regulation, miRNA regulation, Genome variation, etc. ) the budget is not an issue , My Question is where to start with respect to equipment(hardware) , and staff. Thank you - Sara

genome next-gen sequencing core • 3.2k views
6
Entering edit mode

budget is not an issue

That does not happen in real world :) Perhaps you were told that so you would sign on.

1
Entering edit mode

Also, unlimited budget is somewhat at odds with the question "where to start"...

2
Entering edit mode
0
Entering edit mode
0
Entering edit mode

Thank you Guys appreciate your help -Sara

8
Entering edit mode
6.4 years ago

I think the hardware side of things is easy enough (find out the vendor your IT folks have been working with and spec out a small cluster with them, make sure that you include a backup method of some sort). Regarding personnel, you would at the very least need one staff-scientist level person to oversee things like "are the sequence runs OK?", "let's build a pipeline", "are the results of the pipeline reasonable given the biological questions being posed?" (I'm assuming you'll be doing the full analysis rather than kicking BAM files down to the wet lab folks).

The most important part of all of this is something you didn't mention and that's what the expectations are of you/the core. This needs to be spelled out very early on and very clearly. What typically happens is that a core is set up with goal X in mind and 6 months later you're working on X plus A->Q. This is, to me at least, the most important thing to clear up with all of the stakeholders before you start putting things out for bid or placing job ads (btw, you can do that here).

2
Entering edit mode

"kicking BAM files down to the wet lab folks" hahaha xD

4
Entering edit mode
6.4 years ago

My two cents.. since I lived this experience in my University

If you invest in a huge computer facility, give for sure that you will spend a lot of money (and I mean a lot) and can expect that the computers will become obsolete after a few years. Not to mention the efforts to maintain that service.

In the other side.

Time for huge changes in the NGS world is coming in the short or medium range. We will be using a new generation of sequencers and/or utilities that will require fewer resources. One example is the use of long read sequencers (pacbio, nanopore and the like), or programs like Kallisto that run an alignment in minutes using 1 or 2 Gb of RAM only.

You also need to consider to hire the efforts of a system maintainer

In our case, we put all these things in a balance, and we took the decision of not to spend such a huge amount of money. We are using computers facilities like Amazon EC (you pay for what you use) or supercomputers around us. Amazon EC maintains their own computers, and this is a labor you avoid

1
Entering edit mode

Hi- Thanks for sharing this. I'm curious about Amazon EC.

I have no experience with it but from what I have heard once you are logged in (via ssh I guess?) it looks like a server or cluster like any running some flavour of Linux, right? If so, does it use a scheduler to process your jobs, like LSF or slurm?

Also, when you transfer largish files (fastq, bam etc) is the speed of transfer an issue?

1
Entering edit mode

Yes it does behave like a regular server (amazon EC2). Look at google compute/microsoft azure as well. You may be able to get better prices there. Current limitation is the max amount of RAM one can have with a server. Last I looked at this it was 256GB RAM.

If you are at an institution that has good network connection with your internet provider (and if the cloud provider also has a good peering connection) then you can basically get wire speed for data transfers (you will pay for that though).

2
Entering edit mode
6.4 years ago

I think a small cluster would start at 10 x 16 core compute nodes, plus a head node, each with 256GB of RAM. Importantly, don't skimp on the storage, particularly if most of the work you are planning to do is exomes, which are particularly space requiring. Make sure you get something that doesn't get slower as it gets busier, so something like Isilon. Connect it all together with at least Gigabit ethernet. Cloud is definitely a possibility, but watch out for the data transfer costs, which more than doubled the quote last time I costed a grant on the cloud.

To run all that you'll need a sys admin. In addition employ at least one high grade, properly experienced bioinformatician (minimum grade is at least the career grade for a research division leader). Other employees depends on what you want from the core. If you just want "you provide sequence, I provide lists of SNPs", then you'll probably be okay with masters level people. However, my experience is that most folks want help interpreting the data as much as analysing it. In this case I'd argue for hiring a bunch of postdoc level people, and basically hawking them out as rent-a-postdocs - spending 30 or 50% of their time on a project for a collaborator over the period of six months to a year for each project. Either pay them well, and offer job security, or offer them a slice of their time to work on projects of their own choosing - you will need to do something to stop the ones that are anygood leaving for jobs with more freedom at the first opportunity.

2
Entering edit mode
6.4 years ago
GenoMax 127k

You either are creating a sequencing core lab (that may do needed bioinformatics on the side) or you should consider creating two separate cores. In latter case, one just does sequencing and the other bioinformatics. That way both would be free to follow other opportunities, since 2000 exomes a year will not keep either core completely busy.

Edit: Re-reading your original post it sounds like you are only setting up a bioinformatics core (i.e.sequencing may be done elsewhere). So above may not apply. I will leave the answer here in case both aspects apply.

1
Entering edit mode
6.4 years ago
chen ★ 2.4k

Since budget is not an issue, obviously you need a set of Illumina HiSeq X Ten, and then build a data center

0
Entering edit mode

The HiSeq X Ten System is the most powerful sequencing platform ever created. The system consists of a set of 10 HiSeq X ultra-high-throughput instruments that deliver over 18,000 human genomes per year at the price of \$1000 per genome. The HiSeq X Ten makes human whole-genome sequencing more affordable and accessible than ever before.

Sounds like overkill to me, if you are (only) going to run 2000 exomes per year. Also, I assume the "hardware" is more related to servers (and not to sequencers).

1
Entering edit mode

The operative phrase is "budget is not an issue" :)

1
Entering edit mode

I thought it was 'We are establishing a core bioinformatics facility' ;) Haven't yet seen the OP mention they needed HiSeq's of any flavour :)

0
Entering edit mode

The more I repeat that in my head the better it sounds. Start ordering PromethIONs and a HiSeq X Ten system then! Give me just a bit of time to finish my PhD and hire me! But perhaps Sara will have some more information for us soon.