I am looking to hear anyone and everyone you don't have to have built a data warehouse but you have worked on servers that were meticulously maintained and organised - also long as you can explain what you liked, disliked features you would have liked when you are definitely qualified to comment to this post :-)
We are currently building a smaller data warehouse and could use advice from others particularly on how they organised their data. The warehouse provides dedicated nodes with decent memory and layers of security to comply with numerous regulations. It will store GWAS data such as raw genotype data and QC´ed data and function as a workspace.
We have previously had our raw genotype data stored on servers without having a good order or structure in play. Besides ending up with several duplicate entries of raw data and QC´ed data, we also had severe problems with handing over information, for example, students who finished or otherwise left. usually, the folder only contained the data and information about how and what was done to the data or where it was obtained from was kept on in someone's mind or written down in a thesis. Now some of it might be solvable just by adding and enforcing others to add a README files to each folder (it would be a start). But in some cases, it might also to have other structures that cross-referenced or helped organise biomarkers, phenotype data, cohorts.
Thanks for your time and input