I'm working in a university research group that's drafting plans for a large multi-omics resource acting as both a data-repository and an online tool for integrative analyses of data.
As a bioinformatician, I feel the task involves aspects I'm not strictly an expert in, especially for a project of this scale: 7/8 figure budget, 200TB raw data, decisions on backend databases, frontend tech stacks, etc. We've started reaching out to cloud service providers (e.g. AWS, Google) and software-engineers within the university.
It's still early days but I'm wondering if anyone has experience in making a success of a project of this size? The plan is to move from conception through to deployment in a 2-3 year timeframe.
My initial sentiment is to bring in commercial contractors for an initial consultation (say 6 months), then to bring in backend and frontend devs (again, probably on a contractual basis) to work with bioinformaticians (postdocs mostly).
Of course there are questions about how many individuals need to be hired, which skills need to be prioritized and at which stage (e.g. database management / backend / frontend, etc.), and which technologies to use. I'd be keen to push for a no-sql / django / flask backend with microservice api's (relatively easy to train up our bioinformaticians to create api's / modules to extend functionality), but that's just me.
Would be great to hear any tips / stories from others who have made a success of a large project like this!