Forum: How Do You Use Cloud Computing For Bioinformatics In 2013?
gravatar for Eric Normandeau
7.7 years ago by
Quebec, Canada
Eric Normandeau10k wrote:

There has been a few questions about the cloud and its uses in bioinformatics on Biostar, but most of the questions date back a bit (see for example: Is Amazon's EC2 commonly used for bioinformatics? and Experiences with cloud computing in bioinformatics

I work in an environment where a variety of computing ressources are available, from my desktop to in-building servers and a country-deployed supercomputer infrastructure. Because of the variety of needs we face when doing data analysis, often more options are better. This is why we have recently started using EC2 from Amazon Web Services.

In this context, it would be nice to have an update on how Biostar members use the cloud for their computing needs. It would be nice if you could state which services you use, in what context and for what kind of projects/analysis. If you don't use the cloud, maybe you could also write about your experience and what turned you off.

I think new insights into how and when to use cloud computing could benefit a lot of small or medium-sized labs doing bioinformatics. It would surely help us!

NOTE: You do not have to be a big player to post an answer. Please share your experience!

bioinformatics forum cloud • 5.5k views
ADD COMMENTlink modified 7.7 years ago by Woa2.8k • written 7.7 years ago by Eric Normandeau10k
gravatar for Emily_Ensembl
7.7 years ago by
Emily_Ensembl21k wrote:

At Ensembl we use the cloud to speed up our services around the world. This improves download speed for our users in the States and in Asia. Our main servers are in the UK, but we have cloud services on the East and West coasts of the USA and in Singapore. These are provided by Amazon EC2.

We produce a genome browser that integrates genome, gene, variation, regulation and comparative genomic data. We release this in pretty browser format, but also have a free to use Perl API. We have lots of shiny tools for accessing all this data (eg Variant Effect Predictor, BioMart, REST-API).

The reason we use the cloud is that our American and Asian users weren't getting the same performance as our European users. By giving them local mirrors, we improved it for them.

Own-trumpet blowing alert: we're also mentioned in today's Nature.

ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Emily_Ensembl21k

Hi Emily. Maybe you could add some info in your answer about what your group does, for example what services you provide.

ADD REPLYlink modified 7.7 years ago • written 7.7 years ago by Eric Normandeau10k

Apparently, Emily is working for Ensembl. It seems that Ensembl uses EC2 primarily for data sharing.

ADD REPLYlink written 7.7 years ago by lh332k

Hi, sorry, I thought my name gave that away (I figured it was best to be completely unsubtle and express my vested interest in my name). I'll edit my answer to explain a bit more about Ensembl.

ADD REPLYlink modified 7.7 years ago • written 7.7 years ago by Emily_Ensembl21k

Thank you. It's what I gathered from your name too, but often explicit is better than implicit :)

ADD REPLYlink written 7.7 years ago by Eric Normandeau10k
gravatar for Malachi Griffith
7.7 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith18k wrote:

We do not use cloud computing at our center (The Genome Institute - WashU Medical School) because we estimate that overall it is not cost effective compared to using our own cluster as long as we keep said cluster busy (>4000 cpus, 15-20Pb storage, etc.).

Where I have found it very useful is in an educational context. We use the cloud (Amazon AWS) for hands on tutorials associated with various workshops organized by the Canadian Bioinformatics Workshops series. We obtain access to a series of EC2 cloud instances a few weeks before each workshop starts. We spend this time installing software, timing exercises, and making sure everything works as expected. Then a few days before the course we freeze development, decide on the type of instances we will need (memory, number of CPUs, etc.), and create an Amazon Machine Instance (AMI). When the students arrive, we spin up one instance for each student and assign it to them for the duration of the workshop. Data that is needed by all students during the exercises is stored in an S3 bucket that is mounted on all instances. This creates a very consistent and predictable environment for all students. We have not had serious problems with up to 40 students hammering the same S3 storage. Since each student has their own instance, they do not compete with each other for CPU cycles. We are able to perform alignments and assembly of NGS data (small to modest amounts) quickly enough to accommodate the flow of an educational setting.

There are many advantages to this approach in this setting. The main downside is that for cost reasons, we can only make the student instances available for the duration of the course.

Amazon has an AWS in education grant program that works well in this very modest, short term educational setting.

ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Malachi Griffith18k

Thanks Malachi, this is very interesting information, including a lot of details about your set up! It is very pertinent to share about the education grand program, which people may not be aware of (I was not).

ADD REPLYlink modified 7.7 years ago • written 7.7 years ago by Eric Normandeau10k

Been a while since I saw this.  We still find the cloud very for delivering hands-on bioinformatics workshops. This tutorial post might be useful to this thread: Introduction to AWS Cloud Computing.

ADD REPLYlink written 5.9 years ago by Malachi Griffith18k
gravatar for JC
7.7 years ago by
JC12k wrote:

Few weeks ago I use EC2 to run Trinity for a large de novo RNAseq assembly, I plan to use more often for many other tasks if I have the resources and money.

ADD COMMENTlink written 7.7 years ago by JC12k
gravatar for Ryan Dale
7.7 years ago by
Ryan Dale4.9k
Bethesda, MD
Ryan Dale4.9k wrote:

For things like routine ChIP-seq, RIP-seq, and RNA-seq analysis, I haven't found a reason to make the jump to the cloud yet.

With only a couple of new experiments a month to worry about, pipelines finish in a handful of days per experiment on a single dedicated machine (8 CPU, 24GB RAM). Sure, this could be sped up by the cloud or the cluster we have available, but this would require extra sysadmin-type work, data transfer time, and most importantly, data storage costs.

Keeping everything local works well and is efficient to maintain -- I'm willing to trade a day or two of compute time for simplicity and low cost. Most of the bioinformatics effort goes into downstream analysis of these data, which doesn't need that much horsepower (at least in terms of hardware).

I think that if cloud storage were cheaper, I would reconsider.

ADD COMMENTlink written 7.7 years ago by Ryan Dale4.9k

I also find EC2 storage a bit too pricey for many NGS applications for small labs... and keep doing everything locally. So far so good:)

ADD REPLYlink written 7.7 years ago by Leszek4.1k
gravatar for Richard Smith
7.7 years ago by
Richard Smith400
Cambridge, UK
Richard Smith400 wrote:

I've been using Amazon EC2 recently while building nowomics. I use it for data integration and hosting databases so probably have a different experience from those running analysis pipelines.

As I have no other servers it's been a great way to get started on a new project. It took a while to find my way around (I found the documentation way too verbose) but once I got set up and had scripted some basic operations the flexibility is fantastic.

To get started it's been cheap but the default storage on EBS is slow, there are options to pay more for EBS-optimised instances, provisioned IO and SSDs which I've heard are much better. Also to get higher RAM servers for running databases effectively gets expensive. If you need to run always-on services getting reserved instances is essential, you can now buy/sell incomplete reserved terms (usually 1 or 3 years) in a marketplace. So if you don't exactly know what you need you can buy reserved instances for just a few months or sell reserved time you no longer need.

I've found network IO can be patchy, particularly on smaller instance types, and had to build in provision for e.g. dropped connections and timeouts when downloading files. S3 storage has been great and really simple to build into workflows.

If I were at an institution with good infrastructure and some existing hardware I would see no compelling reason to make the jump for data integration and hosting databases. However, if you're starting something new where requirements will change over time I think EC2 is a great option.

ADD COMMENTlink written 7.7 years ago by Richard Smith400
gravatar for akislyuk
7.7 years ago by
akislyuk30 wrote:

At DNAnexus, we use cloud computing to enable bioinformaticians to solve their problems instead of worrying about the underlying infrastructure.

Our experience is thoroughly documented on our wiki and on the DNAnexus Answers forum.

Please ask away about our experience and what we provide, I'll update this post and respond!

ADD COMMENTlink written 7.7 years ago by akislyuk30
gravatar for Woa
7.7 years ago by
United States
Woa2.8k wrote:

A few cloud based examples in this page:

  • Cloud Computing for Protein-Ligand Binding Site Comparison,
  • A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm,
  • Streaming Support for Data Intensive Cloud-Based Sequence Analysis,
  • Exploiting GPUs in Virtual Machine for BioCloud,
  • wFReDoW: A Cloud-Based Web Environment to Handle Molecular Docking Simulations of a Fully Flexible Receptor Model, GPU-Based Cloud Service for
  • Smith-Waterman Algorithm Using Frequency Distance Filtration Scheme,
  • Translational Biomedical Informatics in the Cloud: Present and Future,
ADD COMMENTlink written 7.7 years ago by Woa2.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2710 users visited in the last hour