My group processes sequence data for a medium-sized research institute. Most of our data is RNA-seq, some ATAC-seq, occasionally other things. We handle ~1-2 billion short reads per week, generated by our in-house sequencing core on a NextSeq. We tend to do most of our secondary analysis and prototyping of workflows on in-house computing resources, and use cloud resources for faster / more scalable processing of the majority of our data. We have been running filtering/trimming, alignment, and counting steps on AWS resources, using Galaxy to manage workflows, with a supported interface managed by Globus/Navipoint. We are considering alternatives to this approach, and wanted to get a sense of how other folks in the field are approaching similar problems, and how well those approaches are working.
Are people using AWS / Google Cloud / Azure directly? Or are you using providers that facilitate running workflows/pipelines on those resources, such as Seven Bridges, DNAnexus, or similar? I'm mostly interested in hearing experiences and pros/cons that you've encountered, to determine if we could improve on our current approach.
Most cloud savvy users likely use cloud resources directly since platforms like DNANexus require additional costs. If one does not have access to IT expertise then those platforms may be attractive.
It looks like you are using some kind of back and forth movement of data, which may actually be costing you more (with data ingress/egress charges). Are you running your galaxy locally or in cloud?
I am moderately cloud savvy and use AWS directly for a few tasks, but we also have an NCI funded Seven Bridges partnership for our omics pipelines. My point is, it need not be exclusive.
I also use an in-house HPC for omics processing and I know a lab that uses AWS ParallelCluster in its stead as that lab needs more control on the HPC resources.
Each approach has pros and cons, but generally the trend is: local = more power (over the pipelines), more responsibility (over the infrastructure/resource usage); cloud = less power, less responsibility.
I get the feeling most DNAnexus/SBG users are top-down - i.e. someone higher up, or a grant funder, signed a contract and said "you're using this"