Question: Cloud storage of sequencing and bioinformatics data
0
gravatar for emblake
2.7 years ago by
emblake50
United States
emblake50 wrote:

My small core lab needs to setup server space for storage of sequencing and bioinformatics data. We currently have only a few TB of allocated server space from IT and are reluctant to store data on external hdd or USBs. We have 1 NextSeq and 1 high performance workstation for analysis (no compute cluster) for one user.

Does anyone have any experience with cloud storage on Amazon S3/Google/Dropbox? If so, what has been your experience?

Any suggestions or input regarding ngs data storage is appreciated! Thank you!

next-gen • 1.2k views
ADD COMMENTlink written 2.7 years ago by emblake50
3
gravatar for genomax
2.7 years ago by
genomax68k
United States
genomax68k wrote:

Both Amazon and Google cloud storage would be excellent choices. There are multiple types with tiered costs based on intended applications.

That said, have you looked into using tape based storage, if it is available, from your local IT. You will rarely need to keep original data online for longer than 6 months (we have 10 times more machines and don't) and at that point it can be shuffled on to tape for long term storage (just store a copy of the tarred original data folder and fastq files for immediate access). Tape is still the cheapest medium and very robust for long term storage. Anyone tells you otherwise, ask them to pay for your cloud storage costs :-)

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by genomax68k

I'm working with my institution's IT group to see about AWS S3 storage costs for sequencing and bioinformatics since they are only able to allocate 5TB for the NextSeq AND data analysis. I have not discussed tape options but will bring that up. Oiy - big data - what a headache! Thanks for your input!

ADD REPLYlink written 2.7 years ago by emblake50

Make sure you have good upstream network connectivity. It takes a long while to transfer a TB of data, if you have anything less than a full gigabit connection, all the way from your server to amazon/google.

BTW: If you have only one NextSeq, 5 TB may be enough for "normal operation" depending how how often you expect to run the sequencer each week. You will be amazed by how quickly reagent costs add up, before the data :-)

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by genomax68k

Sorry for the late reply - I've since been asked to look at NAS devices to serve as (temporary) storage space for sequencing data and associated project analyses. As I'm not a sysadmin or IT guru, would you recommend this option? I'm worried about server maintenance and compatibility between the NextSeq (Windows OS) and most NAS's (Linux OS). Also, I still don't know the connectivity type/speed.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by emblake50

There are NAS and then there are other NAS (some may be just cheap boxes with drives others may be much more sophisticated/robust/expensive). In general, NAS will be fine for "storage". Inexpensive NAS boxes may not be fast enough to use for compute with a cluster (but you could certainly use it with just one or two servers). You have not said if you are planning to use the onboard analysis software on NextSeq for analysis (or BaseSpace). I don't think there should be a compatibility issue between windows and NAS as far as storage goes. As long as windows sees the network share NAS puts out, it will be able to mount it.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by genomax68k

We're actually not using BaseSpace for analysis. We have 1 high performance workstation for analysis. I was worried about moving sequencing data from the NextSeq to a NAS and then to the Linux workstation for analysis. I've never used the bcl2fastq conversion software, so I'm not sure if it will be time consuming. We would only be using the NAS for storage, no compute time. We also do not have access to a compute cluster. Thanks for your input!

ADD REPLYlink written 2.7 years ago by emblake50

If the NAS is mounted both on NextSeq and the workstation then you would not need to move the data around. Just make sure you keep your derived data separate from the original raw data. Also back the data up regularly somewhere (e.g. tape from your local IT) in case one of the pieces experiences hardware issues.

ADD REPLYlink written 2.7 years ago by genomax68k

Perfect - got it. Thanks so much!

ADD REPLYlink written 2.7 years ago by emblake50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1921 users visited in the last hour