Question

Forum:Experiences with different cloud computing resources / providers

1

Entering edit mode

20 months ago

mjd79 ▴ 10

My group processes sequence data for a medium-sized research institute. Most of our data is RNA-seq, some ATAC-seq, occasionally other things. We handle ~1-2 billion short reads per week, generated by our in-house sequencing core on a NextSeq. We tend to do most of our secondary analysis and prototyping of workflows on in-house computing resources, and use cloud resources for faster / more scalable processing of the majority of our data. We have been running filtering/trimming, alignment, and counting steps on AWS resources, using Galaxy to manage workflows, with a supported interface managed by Globus/Navipoint. We are considering alternatives to this approach, and wanted to get a sense of how other folks in the field are approaching similar problems, and how well those approaches are working.

Are people using AWS / Google Cloud / Azure directly? Or are you using providers that facilitate running workflows/pipelines on those resources, such as Seven Bridges, DNAnexus, or similar? I'm mostly interested in hearing experiences and pros/cons that you've encountered, to determine if we could improve on our current approach.

cloud pipeline rna-seq alignment • 955 views

ADD COMMENT • link updated 20 months ago by Jeremy Leipzig 22k • written 20 months ago by mjd79 ▴ 10

0

Entering edit mode

Are people using AWS / Google Cloud / Azure directly?

Most cloud savvy users likely use cloud resources directly since platforms like DNANexus require additional costs. If one does not have access to IT expertise then those platforms may be attractive.

We tend to do most of our secondary analysis and prototyping of workflows on in-house computing resources

It looks like you are using some kind of back and forth movement of data, which may actually be costing you more (with data ingress/egress charges). Are you running your galaxy locally or in cloud?

ADD REPLY • link 20 months ago by GenoMax 142k

1

Entering edit mode

I am moderately cloud savvy and use AWS directly for a few tasks, but we also have an NCI funded Seven Bridges partnership for our omics pipelines. My point is, it need not be exclusive.

I also use an in-house HPC for omics processing and I know a lab that uses AWS ParallelCluster in its stead as that lab needs more control on the HPC resources.

Each approach has pros and cons, but generally the trend is: local = more power (over the pipelines), more responsibility (over the infrastructure/resource usage); cloud = less power, less responsibility.

ADD REPLY • link 20 months ago by Ram 43k

0

Entering edit mode

I get the feeling most DNAnexus/SBG users are top-down - i.e. someone higher up, or a grant funder, signed a contract and said "you're using this"

ADD REPLY • link 20 months ago by Jeremy Leipzig 22k