I'm looking for a system to store human NGS data and metadata, and to retrieve data. We have a storage server with a proper distributed filesystem (Isilon OneFS).
There are some other posts discussing this topic, for example:
- A: How Do You Store And Share Your Bioinformatics Data? (Fasta, Fastq, Sff, Etc.)
- A: Using Hdf5 To Store Bio-Data
- Storage Solutions For Genomic Research Centers
But I wanted to make a new post because (1) those posts are several years old, and I imagine practices are different today, and (2) they discuss file formats and distributed file systems a lot, while I'm more interested in ways to access data.
I would like to have a system, preferably with a GUI (browser is also fine), where I can search for an individual (pseudonym ID), and retrieve their data:
- Raw NGS data (FASTQ)
- Aligned reads (BAM)
- Variants (VCF)
- Metadata, for example whether the individual is part of a trio, was the individual sequenced more than once, how was the individual sequenced, etc.
I also want to be able to retrieve data (VCF or BAM or whatever is specified) from a list of individual IDs.
- Retrieve variants from individual lists in specified gene(s), loci or type of variation.
- Incorporating genome browsers such as ExAC.
- Or a different kind of genome browser like IGV.
- Familial relationships, for example as in Family Genome Browser (FBG)
Some examples of software I am unsure of:
Any input on this topic would be greatly appreciated.