Question: Is Amazon'S Ec2 Commonly Used For Bioinformatics?
15
gravatar for Blunders
7.3 years ago by
Blunders1.1k
Blunders1.1k wrote:

I've been looking into a variety of computation platforms, CPU-clusters, GPUs, cloud -- and was wondering if anyone was aware of bioinformatic system deployments to the Amazon Elastic Compute Cloud (Amazon EC2).

EDIT: Made a community wiki.

cloud • 9.3k views
ADD COMMENTlink modified 7.3 years ago by audrius.meskauskas90 • written 7.3 years ago by Blunders1.1k
7

Look at Deepak's wiki for the answer to your question: https://github.com/mndoci/mndoci.github.com/wiki

ADD REPLYlink written 7.3 years ago by Brad Chapman9.2k
3

Similar question: http://biostar.stackexchange.com/questions/132/experiences-with-cloud-computing-in-bioinformatics. Also, search this site for "ec2" or "aws".

ADD REPLYlink written 7.3 years ago by Neilfws48k

@neilfws: Thanks, main reason I asked was to get a response from http://biostar.stackexchange.com/users/72/mndoci who I noticed appears to be a rep for Amazon EC2

ADD REPLYlink written 7.3 years ago by Blunders1.1k

I'm sure he'll be happy to advise.

ADD REPLYlink written 7.3 years ago by Neilfws48k

+1 @Brad Chapman: Thanks, most interesting thing I ran across looking at his blog is: "Note that Elastic MapReduce apps can be run on a roll-it-yourself Hadoop cluster on EC2 as well" - if true, that's pretty important info to me. Thanks for linking to his page!

ADD REPLYlink written 7.3 years ago by Blunders1.1k

EMR is Apache Hadoop compatible so as long as you are writing Apache Hadoop compatible (or appropriate Hive, Pig, etc) applications, your code will be completely portable

ADD REPLYlink written 7.3 years ago by Mndoci1.2k
8
gravatar for Casey Bergman
7.3 years ago by
Casey Bergman17k
Athens, GA, USA
Casey Bergman17k wrote:

Galaxy has a EC2 instance set-up described here.

EagleGenomics has developed an Ensembl EC2 cloud instance described here.

Ensembl hosts their data on Amazon as well, see here.

ADD COMMENTlink written 7.3 years ago by Casey Bergman17k
3

Plus the Ensembl US East and Asia mirror sites are cloud-based.

ADD REPLYlink written 7.3 years ago by Bert Overduin3.6k

I do not currently have access to our amazon aws account but my colleague mentioned that ensembl data sets are only listed in North America?

ADD REPLYlink written 7.3 years ago by Andrea_Bio2.5k

yes, you have EBS snapshots for both MySQL and FASTA dumps, but only on the US region: MySQL: http://aws.amazon.com/datasets/2315 FASTA: http://aws.amazon.com/datasets/3841

good thing is they're keeping this up to date; last update is dated March 21, 2011 7:16 PM GMT (in stark contrast with the genbank dataset, with a fairly embarrassing last updated date of December 9, 2009)

ADD REPLYlink written 7.3 years ago by Eduardo Pareja Tobes0
6
gravatar for Pablo Pareja
7.3 years ago by
Pablo Pareja1.6k
Granada, Spain
Pablo Pareja1.6k wrote:

Hi all,

in our company (Era7 Bioinformatics) we use EC2 (as well as S3, EBS... and recently CloudFormation) on a daily basis.

In our case it's vital for semiautomatic genome annotation processes and transcriptomics between others.

Another example would be the open-source project Bio4j ("a graph based DB including most data available in UniProt (SwissProt + Trembl) plus Gene Ontology (GO) and UniRef(50,90,100).") which is being developed by our R & D department (oh no sequences!).

I'd dare to say that it's just about time it'd be widely used because of its scalability and low prices.

ADD COMMENTlink written 7.3 years ago by Pablo Pareja1.6k
2

Depending on how much bandwidth you have, you can use something like FDT (http://monalisa.cern.ch/FDT/), Tsunami (http://tsunami-udp.sourceforge.net/) Aspera (on the commercial side) and also leverage S3's multipart upload. Key is parallelization

ADD REPLYlink written 7.3 years ago by Mndoci1.2k
1

@Mikael Huss: I think the importexport option (as David Quigley commented) is a pretty good option. You can calculate the price of uploading a lane of HiSeq with the importExport calculator http://awsimportexport.s3.amazonaws.com/aws-import-export-calculator.html

ADD REPLYlink written 7.3 years ago by Marina Manrique1.3k

+1 @Pablo Pareja: Thanks for sharing!

ADD REPLYlink written 7.3 years ago by Blunders1.1k

@blunders you're welcome, don't hesitate to contact us if you any doubt about our services/projects ;)

ADD REPLYlink written 7.3 years ago by Pablo Pareja1.6k

Your company looks very interesting! Just wondering how you deal with actually getting your data into Amazon. Uploading, say, a lane of HiSeq data seems to take forever (yes, I have tried). Of course, there are EBS snapshots and S3 volumes for many useful public genomic datasets.

ADD REPLYlink written 7.3 years ago by Mikael Huss4.6k

One way is to mail them a hard drive: http://aws.amazon.com/importexport/

ADD REPLYlink written 7.3 years ago by David Quigley11k

@Mikael Huss as David Quigley points out, amazon provides this import-export service. We use it whenever datasets are too large for standard uploading being an option; (besides, there's a cool importExport price calculator available as my colleague Marina says...)

ADD REPLYlink written 7.3 years ago by Pablo Pareja1.6k

Thanks, everyone.

ADD REPLYlink written 7.3 years ago by Mikael Huss4.6k
4
gravatar for Chris Evelo
7.3 years ago by
Chris Evelo9.9k
Maastricht, The Netherlands
Chris Evelo9.9k wrote:

Yes, I happened to see a demo by Mike Cariaso at the IB2011 conference in Wageningen 2 weeks ago that showed you can install a complete Galaxy server on the Amazon cloud in minutes. "Install" might be the wrong word here, it is actually a pre-installed Galaxy server that you just have to size in terms op CPU and memory needs. What he showed is available at http://www.runblast.com (check the video first).

ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by Chris Evelo9.9k

+1 @Chris Evelo: Thanks for the link to the video, I'll have a look!

ADD REPLYlink written 7.3 years ago by Blunders1.1k
4
gravatar for Neilfws
7.3 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

One of the best (and one of the few?) examples to date is Crossbow, a short-read mapping tool that uses Elastic MapReduce. See their website and publication.

ADD COMMENTlink written 7.3 years ago by Neilfws48k
3
gravatar for lh3
7.3 years ago by
lh331k
United States
lh331k wrote:

As others have said, a lot of progress has been made to adopt EC2. Many believe this is an irreversible trend, which I tend to agree. On the other hand, if you ask if EC2 is "commonly used for bioinformatics", I would say "haven't yet". Some friends of mine are actively exploring EC2, but none of them are actually using EC2 for real works. Many people in this Q&A sites are also talking about EC2, but from their answers to previous questions, it seems to me that only a couple of them are using EC2 for daily data processing.

ADD COMMENTlink written 7.3 years ago by lh331k

@lh3: Agree, mainly increased in just that -- how the use of cloud computing is evolving within bioinformatics. Appears some only roll internal systems due to regulatory compliance issues, though it's hard to tell.

ADD REPLYlink written 7.3 years ago by Blunders1.1k

A friend of mine has set up a private cloud and uses galaxy on that. He is very happy with that.

ADD REPLYlink written 7.3 years ago by lh331k
3
gravatar for Mndoci
7.3 years ago by
Mndoci1.2k
Issaquah, WA
Mndoci1.2k wrote:

People did all the answering for me. The wiki (https://github.com/mndoci/mndoci.github.com/wiki) covers all the existing apps.

Also, check out Matt Wood's tutorial and video on how to get a high perf cluster provisioned on EC2: http://aws.typepad.com/aws/2011/03/build-a-cluster-computing-environment-in-under-10-minutes.html

ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by Mndoci1.2k

+1 @mndoci: Agree, I was really surprised by the volume and quality of the answers. Thanks for the link to the video, I'll check it out. Cheers!

ADD REPLYlink written 7.3 years ago by Blunders1.1k
2
gravatar for Benm
7.3 years ago by
Benm20
Benm20 wrote:

Bela Tiwari of the NEBC Bio-Linux team has written an excellent introduction to Amazon EC2 and CloudBioLinux.

ADD COMMENTlink modified 7.3 years ago by Benm710 • written 7.3 years ago by Benm20
2

@blunders: here it is, from NERC, http://nebc.nerc.ac.uk/tools/bio-linux/bio-linux-6.0

ADD REPLYlink written 7.3 years ago by Benm710
1

@BENM: Is the introduction online, and if so, have a link? Thanks!

ADD REPLYlink written 7.3 years ago by Blunders1.1k

@blunders: here it is, from NERC, http://nebc.nerc.ac.uk/tools/bio-linux/bio-linux-6.0

ADD REPLYlink written 7.3 years ago by Benm710

@blunders: here it is, from NEBC, nebc.nerc.ac.uk/tools/bio-linux/bio-linux-6.0

ADD REPLYlink written 7.3 years ago by Benm710

It's also available under the link 'Getting started with CloudBioLinux' at http://cloudbiolinux.com/

ADD REPLYlink written 7.3 years ago by Brad Chapman9.2k
2
gravatar for Andreas
7.3 years ago by
Andreas2.4k
Singapore
Andreas2.4k wrote:

To add to the above:

ADD COMMENTlink written 7.3 years ago by Andreas2.4k
1
gravatar for Benm
7.3 years ago by
Benm710
Benm710 wrote:

@blunders: Bio-Linux is based on Ubuntu 10.04

ADD COMMENTlink written 7.3 years ago by Benm710
1
gravatar for Tim
7.3 years ago by
Tim320
Nijmegen, the Netherlands
Tim320 wrote:

QIIME offer their pipeline for microbial community analysis as an EC2 image.

ADD COMMENTlink written 7.3 years ago by Tim320
1
gravatar for audrius.meskauskas
6.0 years ago by
audrius.meskauskas90 wrote:

Big heresy, of course, but I think that if the service is heavily used, it costs more or less the price of the equipment in about a year. Multiply these amazingly cheap prices "per hour" by the number of hours in a year and you will see. And uploading gigabytes of sequences to the cloud is also not exactly very fast and not exactly for free.

ADD COMMENTlink written 6.0 years ago by audrius.meskauskas90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1007 users visited in the last hour