Tutorial: Introduction to AWS Cloud Computing
gravatar for Malachi Griffith
5.8 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith18k wrote:

We have increasingly been using cloud computing environments to teach bioinformatics analysis skills.  For example, we have been using AWS to conduct tutorials for some hands on bioinformatics workshops at CBW, NYGC, and CSHL.  For many students learning bioinformatics in this way, "The Cloud" is a very abstract black box.  When they do start to dig into the details, they encounter considerable jargon.  Much of the existing documentation is written by system administration experts and is inaccessible to many biologists.  We created the following tutorial in an effort to demystify some of this jargon and direct the reader to some of the documentation we found most useful.  A table of contents for this tutorial is provided below to give you a sense of the topics covered.  If there are other cloud computing tutorials that you have found useful, please link to them here.

Intro to AWS Cloud Computing

Cloud computing allows users to quickly access an arbitrary amount of compute resources from a distance without the need to buy or maintain hardware themselves. There are many cloud computing services. This tutorial describes the use of the Amazon Web Services (AWS) elastic compute (EC2) resource. However, the fundamental concepts covered here will generally apply to other cloud computing services such as Google Cloud, Digital Ocean, etc., though with substantial differences in jargon used by each provider.

Table of Contents

  1. Preamble
  2. Acknowledgements
  3. Glossary and abbreviations
  4. What do I need to perform this tutorial
    1. Creating an account
    2. Logging into the AWS console
  5. What is a Region?
  6. How much does it cost to use AWS EC2 resources?
    1. How does billing work?
  7. Necessary steps for launching an instance
    1. Step 1. Choosing an AMI
    2. Step 2. Choosing an instance type
    3. Step 3. Configuring instance details
    4. Step 4. Adding storage
      1. Storage volume options
    5. Step 5. Tagging the instance
    6. Step 6. Configuring a security group
    7. Step 7. Reviewing the instance before launch
    8. Step 8. Assigning a key pair
    9. Step 9. Reviewing launch status
    10. Step 10. Examining a new instance in the ec2 console
    11. Step 11. Logging into an instance
  8. Trouble-shooting and advanced topics
    1. I can't login to EC2 instance - what might have gone wrong?
    2. How do storage volumes appear within a linux instance on amazon EC2?
    3. Taking stock of compute resources within an ubuntu linux instance
    4. Basic setup and administration of an ubuntu linux instance
    5. Setting up an Apache web server
    6. What is difference between the start, stop, reboot and terminate instance states?
    7. How do I create my own AMI, publish it as a Community AMI, and what is a snapshot?
    8. Tidying up and shutting down AWS resources
    9. Further reading and preparing for more advanced AWS cloud computing concepts


Citation for this resource:

Malachi Griffith*, Jason R. Walker, Nicholas C. Spies, Benjamin J. Ainscough, Obi L. Griffith*. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud. PLoS Comput Biol. 2015. Aug 6;11(8):e1004393. doi: 10.1371/journal.pcbi.1004393. eCollection 2015 Aug. PubMed PMID: 26248053.

The following existing threads also seem relevant to this topic:

Using Amazon Web Services?
How To Install & Use R/Bioconductor In Amazon Ec2
Making A Bioinformatics Application Available In The Amazon Ec2 Cloud
Amazon Ec2 Cloud Biolinux Performance Underwhelms
How Do You Use Cloud Computing For Bioinformatics In 2013?

computing cloud ec2 tutorial aws • 6.9k views
ADD COMMENTlink modified 5.4 years ago by WayneSantos0 • written 5.8 years ago by Malachi Griffith18k

If we stop the instance, we do not lose data right ? What are some scenarios where we might lose the data on our instance ?

ADD REPLYlink written 5.8 years ago by geek_y11k

It depends on the storage type where your data is. If it is on an ephemeral (Instance Store) volume you WILL lose your data when either stopping or terminating your instance. If your data is on an EBS volume it will persist until the EBS volume is deleted. Be aware that sometimes your EC2 instance might be set up to delete EBS volumes at termination (not usually when stopping) the instance. Otherwise it requires direct action to delete the EBS volume. You should read the sections: on Adding storage, Storage volume options, and What is difference between the start, stop, reboot and terminate instance states?.

ADD REPLYlink written 5.8 years ago by Obi Griffith19k

If we install any tools, generally they go to /usr/bin or /usr/local/bin. Will the installed programmes be deleted once we stop the instance ? Should we install all the program in EBS only ? or should we setup the root storage as EBS ?

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by geek_y11k

Yes, if the root volume is an ephemeral Instance Store volume you should not stop or terminate it unless you have already created a SnapShot or AMI (stored on EBS) to save the state.  This issue can be avoided by using an EBS volume for the root storage when you create your instance.  You can check the type of storage used for a root volume in your current instances in the EC2 console.  More detailed discussion of these concepts is provided in the tutorial sections Obi indicated above.

ADD REPLYlink written 5.8 years ago by Malachi Griffith18k

Have you considered shortening your presentation down to a 'Quick Start' guide?

ADD REPLYlink written 5.7 years ago by mcc80

That's a thought.  We have been doing a very brief intro to cloud computing and then jumping right into analysis tasks at some workshops.  However, we found that students find this very abstract and black box.  This tutorial was created to really try to explain what is going on.  If you just want to get rolling fast, you can log into AWS, hit the launch button and follow the steps in their wizard.  You can mostly just pick the default options and you will be fine.  We also wrote a tutorial that is much shorter but covers the same concepts for a publication that will be online soon: https://github.com/genome/gms/wiki/Beginners-Guide-to-installing-the-GMS-on-an-Amazon-Web-Services-%28AWS%29-Instance.  All that being said, I think that if a beginner is going to work with AWS, they should do themselves a favor and devote a couple hours to understanding the basics.  That investment will probably be saving you headaches pretty quickly...

ADD REPLYlink written 5.7 years ago by Malachi Griffith18k

One more short question, Do you explain how to add resources to an instance?  For example, I would like to grow my disk space and processing power by adding more cpus, etc.  

BTW, Great job.

ADD REPLYlink written 5.7 years ago by mcc80

Good question.  I believe for the most part, hardware configuration (CPUs, and memory) must be set when the instance is created and can not be adjusted.  However, you can create an AMI of your instance, and then launch a new instance using that AMI.  At that point you can choose an instance with more CPUs and memory.  Similarly, if you want to expand the disk storage for an instance, you can create a snapshot of that instance, stop the instance and attach a larger one by following this procedure.  If you just want to attach a new volume to a running instance, you can follow this procedure.

ADD REPLYlink written 5.7 years ago by Malachi Griffith18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2725 users visited in the last hour