Storing large amounts of data will become a problem for the bioinformatics, sooner or later. I've faced this problem recently and a lot of questions that I've never thought before just surfaced. The most obvious are: How to decide the filesystem? How to partition a large (TB range) HD? When is a cheap solution (e. g. a bunch of low-end HDs) inappropriate?
These are pressing issues here at brazilian medical community. Everyone wants to buy a NGS machine, mass spec or microarray but no one perceives the forthcomming data flood.
In practical terms, how do you store your data? A good reason for a given decision would be great too.
I've asked this question not so long ago and thing got HOT here. They just finished to build a whole facility to deal with cancer. A lot of people aquired NGS machines and TB scale seems be a thing of the past. Now we are discussing what to keep and how to manage the process of data triage/filtering. So, I do really need new tips from the community. Is someone facing a similar problem (too many data)?
Well, things are pretty fast paced these days. 4TB HDDs are the standard, SDDs are common, servers with onboard Infiniband abound. Also, projects with huge throughput (e. g. Genomics England and it's presumed 300GB per tumour sample). Annotation got way too many layers. Outsourcing sequencing is rather common. This question seems a bit obsolete at the moment.