My small core lab needs to setup server space for storage of sequencing and bioinformatics data. We currently have only a few TB of allocated server space from IT and are reluctant to store data on external hdd or USBs. We have 1 NextSeq and 1 high performance workstation for analysis (no compute cluster) for one user.
Does anyone have any experience with cloud storage on Amazon S3/Google/Dropbox? If so, what has been your experience?
Any suggestions or input regarding ngs data storage is appreciated! Thank you!
I'm working with my institution's IT group to see about AWS S3 storage costs for sequencing and bioinformatics since they are only able to allocate 5TB for the NextSeq AND data analysis. I have not discussed tape options but will bring that up. Oiy - big data - what a headache! Thanks for your input!
Make sure you have good upstream network connectivity. It takes a long while to transfer a TB of data, if you have anything less than a full gigabit connection, all the way from your server to amazon/google.
BTW: If you have only one NextSeq, 5 TB may be enough for "normal operation" depending how how often you expect to run the sequencer each week. You will be amazed by how quickly reagent costs add up, before the data :-)
Sorry for the late reply - I've since been asked to look at NAS devices to serve as (temporary) storage space for sequencing data and associated project analyses. As I'm not a sysadmin or IT guru, would you recommend this option? I'm worried about server maintenance and compatibility between the NextSeq (Windows OS) and most NAS's (Linux OS). Also, I still don't know the connectivity type/speed.
There are NAS and then there are other NAS (some may be just cheap boxes with drives others may be much more sophisticated/robust/expensive). In general, NAS will be fine for "storage". Inexpensive NAS boxes may not be fast enough to use for compute with a cluster (but you could certainly use it with just one or two servers). You have not said if you are planning to use the onboard analysis software on NextSeq for analysis (or BaseSpace). I don't think there should be a compatibility issue between windows and NAS as far as storage goes. As long as windows sees the network share NAS puts out, it will be able to mount it.
We're actually not using BaseSpace for analysis. We have 1 high performance workstation for analysis. I was worried about moving sequencing data from the NextSeq to a NAS and then to the Linux workstation for analysis. I've never used the bcl2fastq conversion software, so I'm not sure if it will be time consuming. We would only be using the NAS for storage, no compute time. We also do not have access to a compute cluster. Thanks for your input!
If the NAS is mounted both on NextSeq and the workstation then you would not need to move the data around. Just make sure you keep your derived data separate from the original raw data. Also back the data up regularly somewhere (e.g. tape from your local IT) in case one of the pieces experiences hardware issues.
Perfect - got it. Thanks so much!