At our small lab we only have a couple of heavy machines to run blast etc. So we were looking into amazon ec2 with the cloudbiolinux AMI : https://aws.amazon.com/amis/cloudbiolinux-ubuntu-13-04-2013-08-28.
I wanted to blastx some sequences versus uniref50 on the cloud. By using small files and check how fast it was going we made an estimate of how fast it would be on the cloud and if it would be worth our money.
Our computational machine is a : 2 X INTEL QC E5506 2.13GHZ 12GB RAM 1 X 1TB SATA HARD DRIVE.
Amazon machine : 3.8xlarge: 60 GiB of memory, 32 vCPUs, 108 EC2 Compute Units, 640 GB of SSD-based local instance storage, 64-bit platform
We compared the performance on the same dataset and DB:
Our local machine : 1500 sequences in 120 min Blosum 62 8 threads. Amazon with cloudbiolinux ami: 1500 sequences in 60 min Blosum62 and 100 threads on the 3.8xlarge
**Local**: 750seqs:60min **Amazon**: 1500seq:60min
We also compared an 'empty VM versus the cloud biolinux':
1500 sequences pam32 and 100 threads in 20 min Amazon ubuntu AMI: sudo apt-get install ncbi-blast+ & downloaded and create the uniref50db 1500 sequences pam32 and 100 threads in 10 min
cloudbiolinux: 1500seqs:20min amazonUbuntu: 1500seqs:10min
We were a not impressed with the performance of ec/2.... Has anybody had similar experiences with using ec2 services?
Maybe some of you guys might give us some pointers on where we went wrong.?