Question

open source webserver for multi-omics data store and descriptional visualization

0

Entering edit mode

15 months ago

Zhilong Jia ★ 2.2k

Any open-source web server (or framework) for multi-omics and sample metadata management and descriptional visualisation is available?

This kind of webserver can be used to store the raw data of multi-omics (such as genomics, transcriptomics, proteomics, metabolomics, microbiome, epigenome), key omics files (e.g. path to files), such as vcf, matrix of expression. Meanwhile, a descriptional visualisation of those matrix data will be better. Thank you.

webserver multi-omics open-source • 651 views

ADD COMMENT • link updated 14 months ago by Matthias Zepper 4.5k • written 15 months ago by Zhilong Jia ★ 2.2k

score 1 · Answer 1 · 2023-01-04

Gen3 is probably the closest to what you were thinking of. Self-hosting is possible, but you will probably need full-time engineer(s) in your organization to set up and maintain the system.

As soon as the amount of your data reaches a level where the benefits of having such a system outweighs the effort of setting one up, customizing and maintaining it unfortunately warrants full-time engineers. Here, a LinkedIn engineer nicely elaborates on the challenges of building and maintaining such a system. Many large companies have worked on internal tooling to make datasets discoverable across the whole organization by gathering all metadata into a central data catalogue, and basically all ended up building their own custom systems to meet their demands. Thus, there is no shortage of open-source systems you could customize to manage your metadata, but none will work out of the box and still require substantial work on your side:

Amundsen
Datahub
iRods
Marquez
Many more...just search for "data catalogue" or "data platform"

There are also some other efforts to build data platforms with a biology/genomics focus, but as far as I know the Elixir Data Catalogue, the European Genomic Data Infrastructure (GDI) and the German Human Genome-Phenome Archive are all work in progress.

For raw data storage, you could also take a look at Hail, but maybe a simple object store with a mantis index is already sufficient for your needs. If you need to serve bioinformatic file formats via a network, various implementations of the htsget protocol (e.g. in Rust) are available. For data versioning, Restic respectively it's reimplementation Rustic could be a relatively straightforward solution that also works without much overhead on the level of single workgroups.