Question: Experiences With Cloud Computing In Bioinformatics
gravatar for Biostar User
9.4 years ago by
Biostar User1.0k
Biostar User1.0k wrote:

In the past years cloud computing services such as the Amazon's Elastic Compute cloud seem to have emerged a recommended alternative for providing high performance computing.

What are your experiences when it comes to bioinformatics in the cloud?

cloud general • 13k views
ADD COMMENTlink modified 2.2 years ago by Sundeep Kumar10 • written 9.4 years ago by Biostar User1.0k
gravatar for Istvan Albert
9.4 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

We have been somewhat early adopters of cloud computing, having evaluated it for our bioinformatics needs more than two years ago. We are also what you could call early abandoners; after using it for a year we compared it against a the high computing facility' services at our university (Penn State HPC) and we found it substantially under-performing.

This is not to say that Cloud Computing is not a fantastic idea, its just that just about all universities and science oriented organizations have far more powerful computing facilities to begin with.

Could computing is probably ideal for satisfying the temporary needs of a small lab with minimal funds and resources as it allows them to perform computations that otherwise would be out of reach. Yet as soon as the lab has continuous computational needs the cloud based solutions become not only more expensive but also a lot less powerful than a comparable "traditional" computing services.

I have come up with my own "rule of thumb" estimation: If within an entire year one only needs to run their computers for less than 30% of time then cloud computing may be worth it.

ADD COMMENTlink written 9.4 years ago by Istvan Albert ♦♦ 80k

I agree with Michael, I work at a small bioinformatics company, thanks to EC2 (AWS in general) we can handle large projects we couldn't otherwise (we haven't any cluster)

ADD REPLYlink written 8.4 years ago by Marina Manrique1.3k

Agree with the point about small labs. I work at a small lab and there is no way that we could afford a large cluster. We can however afford to spin out EC2 when ever we need it.

ADD REPLYlink written 9.4 years ago by Michael Barton1.8k
gravatar for Manuel Corpas
9.4 years ago by
Manuel Corpas650
Manuel Corpas650 wrote:

Cloud computing is becoming a technology mature enough for its use in genome research experiments. The use of large datasets, its highly demanding algorithms and the need for sudden computational resources, make large-scale sequencing experiments an attractive test-case for cloud computing. So far I have seen cloud computing demonstrated using R. However, it remains to be seen a rigorous comparison of its performance using a BLAST search and its ability to cope with ever-increasing databases and open source frameworks such as bioperl or bioconductor.

Cloud computing claims to be a resource where IT power is delivered over the Internet as you need it, rather than drawn from a desktop computer, in a fashion seemingly similar to having your own virtual servers available over the Internet. Some of the most important aspects of cloud computing are:

  • Software as a Service (SaaS): where you buy a software license for a determined period of time.
  • Utility Computing: storage and virtual servers that IT can access on demand.
  • Web Services.

My first exposure to cloud computing came of an email from Matt Wood, a newly established group leader at the Sanger Institute, announcing the Cloud Computing Group in Cambridge, UK. At that point I had no idea of what it meant. When I attended the meeting at Cambridge University’s Centre for Mathematical Sciences, to my surprise I found there a very select audience, ranging from the director of IT at Sanger, Phil Butcher, one of the Ensembl software coordinators, Glenn Proctor, and quite a few local start-up companies.

Among the presenters, we had Simone Brunozzi, from Amazon’s Cloud Computing. I think he had an interesting story to tell: how Amazon, a well known company, is now involved in the business of cloud computing and selling it. Apparently, this technology they sell was developed for Amazon’s own business. Among their main challenges was to be able to address the capricious shopping habits of customers, with orders peaking around Christmas and quite flat the rest of the year. These trends required rapid adaptability of computational resources. The idea of cloud computing fitted well with their business model of e-commerce: you don’t need to care about where your computation is done, the only thing you care about is that you have the needed resources and do not have to pay for them when you don’t need them. One of the things that stroke me about Amazon’s presentation was that they would not tell us the number of processors they had at their disposal.

When it comes to using cloud computing for genomics research, prices may be quite expensive when they add up. The bioinformatics field, greatly influenced by the open-source movement, is not likely to rush to join Amazon’s cloud. Private efforts trying to make money out of human genome technology have remained rather unsuccessful to date: think of Celera Genomics or Lion Bioscience. I am skeptical of the bioinformatics community adopting cloud computing unless open source ideals are embraced:

  • allowing people to develop and contribute to the technology if and when they want to,
  • allowing total openness in terms of its achievements and pitfalls and
  • making it free to use for everyone.

I do not think that making it free does not mean there is no margin for profit. Think of the profitability of free-to-use technologies such as java or MySQL, both components of SUN Microsystems’ business.

Despite the promise of potential benefits for the bioinformatics community, the way the cloud is being portrayed does not conform the ideals of free access and openness. Unless these ideals are implemented to some extent, I see it difficult for the cloud to take root in the bioinformatics field and become a new standard platform for genome research.

ADD COMMENTlink modified 9.4 years ago by Jeroen Van Goey2.2k • written 9.4 years ago by Manuel Corpas650
gravatar for Cassj
9.1 years ago by
Cassj1.3k wrote:

We're a small lab, but we periodically generate next gen seq datasets. We could have access to a cluster on campus to do alignments etc, but then we have to modify code to get it running on the cluster, jump through hoops to get software we need installed and leave our jobs in a queue behind more important people.

It's nice to be able to spin up a few EC2 machines when you need extra processing power, but what really sells it for me is that I can have an image with whatever I like installed and I know exactly what versions of everything I have on there. I can save data and image easily for future use and I can give collaborators access via ssh or a webpage without having to beg my way through the university firewall.

Has anyone had any experience with Eucalyptus ( or similar? An internal university EC2-alike would be great. Looks like the National Grid Service are looking into cloud services too -

ADD COMMENTlink written 9.1 years ago by Cassj1.3k
gravatar for Jeroen Van Goey
9.4 years ago by
Jeroen Van Goey2.2k
Ghent, Belgium
Jeroen Van Goey2.2k wrote:

The J. Craig Venter Institute has released the JCVI Cloud BioLinux image, which "enables scientists to quickly provision computation infrastructures supporting bioinformatics using cloud computing platforms such as Amazon EC2 and Eucalyptus. Upon deployment users will have instant access to a host of software including BLAST, glimmer, hmmer, phylip, rasmol, genespring, clustalw, the Celera Assembler, and the EMBOSS collection of utilities. JCVI Cloud BioLinux is built on a 64-bit instance of Ubuntu virtual server customized with bioinformatics packages from the BioLinux repository, and will be updated periodically."

They give as their motivation for releasing this image "cloud computing can provide researchers with the ability to perform computations using a practically unlimited pool of virtual machines, without facing the burden of owning or maintaining any hardware infrastructure. (...) This Science as a Service model (ScaaS) will allow JCVI to incorporate, develop and optimize life science software as well as supporting data sets on compute clouds. This project is driven by the observation that commonly-used bioinformatics tools are hard to build and maintain, require high amounts of resources, or just too numerous to choose from."

ADD COMMENTlink written 9.4 years ago by Jeroen Van Goey2.2k
gravatar for Simon Cockell
9.4 years ago by
Simon Cockell7.3k
Simon Cockell7.3k wrote:

We've had a couple of Amazon education grants to try out EC2 here, and the service is very impressive. However, it would be extremely expensive to use it as a long-term replacement for our local grid service (which does have its own limitations, but is at least effectively free at the point of delivery) or clusters (in which a considerable amount of capital has already been invested). I think for the amount of grunt work we do, and particularly for the amount of data that needs to be shunted around, cloud computing is not quite there yet.

ADD COMMENTlink written 9.4 years ago by Simon Cockell7.3k
gravatar for Giovanni M Dall'Olio
9.1 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

This article may be of interest for you: Lincoln D Stein', The case for cloud computing in bioinformatics', May 2010 Genome Biology.

ADD COMMENTlink written 9.1 years ago by Giovanni M Dall'Olio26k
gravatar for Will
9.4 years ago by
United States
Will4.5k wrote:

I've started using PiCloud. It is a super simple library for Python that facilitates running your code in the cloud. The client will copy your interpreter's state and then run the code on their Amazon EC2 cluster. They then charge you based on your program's run time. They're currently doing beta trials so its actually free (for now).

The only disadvantage is that they abstract everything away from you ... so its actually impossible to run on your own Amazon EC2 cluster. Its also virtually impossible to run anything that's not Python.

But overall I've found it really easy to implement some of my algorithms using their client.

ADD COMMENTlink written 9.4 years ago by Will4.5k
gravatar for Cedric Dalmasso
9.3 years ago by
Cedric Dalmasso30 wrote:

At ActiveEon, we have provided support for several users in the bio-tech (IPMC, INRA, ...). We develop a software solution federating own resources (clusters, servers, ...) with cloud (EC2, ...). It ease access and use of cloud resources. See and

ADD COMMENTlink written 9.3 years ago by Cedric Dalmasso30
gravatar for racklodge
4.6 years ago by
racklodge0 wrote:

An interesting discussion is worth comment. I think that you should write more on Cloud Computing Implementation.

ADD COMMENTlink written 4.6 years ago by racklodge0
gravatar for Sundeep Kumar
2.2 years ago by
Sundeep Kumar10 wrote:

Hi Respected forum members,

The topic in itself is very interesting. I stumbled upon this when i was searching how cloud computing can be used in bioinformatics.

currently there are other solutions as well such as openstack, cloud foundry etc which helps one to provision private cloud on commodity hardware that certainly reduces the cost.

I am currently learning about cloud computing and trying to find out a way to make it useful in the area of bioinformatics research.

Can i ask you to post some of your challenges or feedbacks about using cloud computing.

my personal email id is

ADD COMMENTlink written 2.2 years ago by Sundeep Kumar10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1223 users visited in the last hour