Forum: Large Multi-omics Project -- Dev Approach
gravatar for CK
11 months ago by
CK10 wrote:

I'm working in a university research group that's drafting plans for a large multi-omics resource acting as both a data-repository and an online tool for integrative analyses of data.

As a bioinformatician, I feel the task involves aspects I'm not strictly an expert in, especially for a project of this scale: 7/8 figure budget, 200TB raw data, decisions on backend databases, frontend tech stacks, etc. We've started reaching out to cloud service providers (e.g. AWS, Google) and software-engineers within the university.

It's still early days but I'm wondering if anyone has experience in making a success of a project of this size? The plan is to move from conception through to deployment in a 2-3 year timeframe.

My initial sentiment is to bring in commercial contractors for an initial consultation (say 6 months), then to bring in backend and frontend devs (again, probably on a contractual basis) to work with bioinformaticians (postdocs mostly).

Of course there are questions about how many individuals need to be hired, which skills need to be prioritized and at which stage (e.g. database management / backend / frontend, etc.), and which technologies to use. I'd be keen to push for a no-sql / django / flask backend with microservice api's (relatively easy to train up our bioinformaticians to create api's / modules to extend functionality), but that's just me.

Would be great to hear any tips / stories from others who have made a success of a large project like this!


sequencing snp rna-seq forum • 296 views
ADD COMMENTlink modified 11 months ago by ATpoint36k • written 11 months ago by CK10
gravatar for Jean-Karim Heriche
11 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

Just for the credentials: I've been involved in projects producing on the order of dozens of TB, like the Mitocheck project already some 10+ years ago and I am regularly involved in projects dealing with multi-TB sized data sets (mostly microscopy images). From my point of view, once you reach the tens of TB, there's not much difference between 20 TB and 200 TB, what matters is actually the granularity of the data, i.e. is it 10 million small files or 10 000 large ones? and the relationships that you need to keep track of. What you need is a serious data management plan. Without knowing the details of the project. it's hard to be specific but some things are generic enough to be mentioned. One of the first things to do is to identify the stakeholders and who is responsible for which part of data management (e.g. who will create and use the data? who will deal with which IT aspect?). Then you need to identify the data flow and access patterns: which data is going to be accessed (e.g. is it the raw data or some derived form of it? what kind of metadata is needed?), how and for what purpose? Are there access control requirements? Will the data change, if so how will this be tracked/propagated? What will happen to the data after the end of the project (funders increasingly care about this)? Also important is to define conventions/standards and document them (e.g. which file formats are going to be used, structure and naming conventions for the project directories and APIs). There's more and all this may influence decision about the technologies you may want to use (cloud or no cloud, sql or nosql...). I would think bringing in commercial contractors for consultation is a waste of money unless they have actually demonstrable experience with the type of project you want to run. You can come up with the/a solution by sufficient brainstorming and asking the right people just like you've started with this post.

ADD COMMENTlink written 11 months ago by Jean-Karim Heriche23k

Thank you kindly Jean-Karim for the excellent post. We'll definitely be exploring these angles/ideas in the coming weeks. Will be back!

ADD REPLYlink written 11 months ago by CK10

This is very good advice. Really understanding all the stakeholders, and capturing the core functional requirements (the "what") before diving too far into solution design (the "how") can be so important. The bigger the project, the more important it gets. If you haven't already done it, reading a a good book on standard project management techniques is worth it.

ADD REPLYlink written 11 months ago by Ahill1.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1679 users visited in the last hour