Question: Amazon Ec2 Cloud Biolinux Performance Underwhelms
gravatar for Verrus
6.4 years ago by
Verrus50 wrote:

Hi Biostars,

At our small lab we only have a couple of heavy machines to run blast etc. So we were looking into amazon ec2 with the cloudbiolinux AMI :

I wanted to blastx some sequences versus uniref50 on the cloud. By using small files and check how fast it was going we made an estimate of how fast it would be on the cloud and if it would be worth our money.

Our computational machine is a : 2 X INTEL QC E5506 2.13GHZ 12GB RAM 1 X 1TB SATA HARD DRIVE.

Amazon machine : 3.8xlarge: 60 GiB of memory, 32 vCPUs, 108 EC2 Compute Units, 640 GB of SSD-based local instance storage, 64-bit platform

We compared the performance on the same dataset and DB:

Our local machine : 1500 sequences in 120 min Blosum 62 8 threads. Amazon with cloudbiolinux ami: 1500 sequences in 60 min Blosum62 and 100 threads on the 3.8xlarge

**Local**: 750seqs:60min
**Amazon**: 1500seq:60min

We also compared an 'empty VM versus the cloud biolinux':

1500 sequences pam32 and 100 threads in 20 min Amazon ubuntu AMI: sudo apt-get install ncbi-blast+ & downloaded and create the uniref50db 1500 sequences pam32 and 100 threads in 10 min

cloudbiolinux: 1500seqs:20min
amazonUbuntu: 1500seqs:10min

We were a not impressed with the performance of ec/2.... Has anybody had similar experiences with using ec2 services?

Maybe some of you guys might give us some pointers on where we went wrong.?

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Verrus50
gravatar for Istvan Albert
6.4 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

The speed of read/write is the primary bottleneck on all cloud systems. The physical file system is networked and shared across instances.

Cloud computing works really well for processes that need number crunching and less reading/writing.

ADD COMMENTlink written 6.4 years ago by Istvan Albert ♦♦ 84k

So basically the reading of the uniref database to and from memory is the bottleneck... Wow did not expect that. Thanks for the insight. How would you explain the difference between an empty installation and the usage of the cloudbiolinux ami?

ADD REPLYlink written 6.4 years ago by Verrus50

For IO intensive jobs, the best approach is to use the local machine (ephemeral) storage: In this case it's an SSD, so should be quite fast. You'll then need to copy final results you want to save back to EBS or S3 since local storage goes away once the machine terminates. In terms of CloudBioLinux versus a fresh AMI, it could be the version of blast+ available. We badly need to update the Amazon AMI with the latest version. If they are the same it's likely fluctuation in EBS read/write throughput, which can vary.

ADD REPLYlink written 6.4 years ago by Brad Chapman9.5k

Did not know that. Some other interesting tidbits on that page:

"Because of the way that Amazon EC2 virtualizes disks, the first write to any location on a standard instance store volume performs more slowly than subsequent writes."

"...we recommend that you initialize your drives by writing once to every drive location before production use."

"Initialization can take a long time (about 8 hours for an extra large instance)."

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Istvan Albert ♦♦ 84k

Thanks Brad, i just had the same realization to use the internal storage and rechecked that and it was EBS....

The blast version in 12.04(the blank install) is 2.2.25+ from jan 3 2012

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Verrus50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 877 users visited in the last hour