We have increasingly been using cloud computing environments to teach bioinformatics analysis skills. For example, we have been using AWS to conduct tutorials for some hands on bioinformatics workshops at CBW, NYGC, and CSHL. For many students learning bioinformatics in this way, "The Cloud" is a very abstract black box. When they do start to dig into the details, they encounter considerable jargon. Much of the existing documentation is written by system administration experts and is inaccessible to many biologists. We created the following tutorial in an effort to demystify some of this jargon and direct the reader to some of the documentation we found most useful. A table of contents for this tutorial is provided below to give you a sense of the topics covered. If there are other cloud computing tutorials that you have found useful, please link to them here.
Cloud computing allows users to quickly access an arbitrary amount of compute resources from a distance without the need to buy or maintain hardware themselves. There are many cloud computing services. This tutorial describes the use of the Amazon Web Services (AWS) elastic compute (EC2) resource. However, the fundamental concepts covered here will generally apply to other cloud computing services such as Google Cloud, Digital Ocean, etc., though with substantial differences in jargon used by each provider.
Table of Contents
- Glossary and abbreviations
- What do I need to perform this tutorial
- What is a Region?
- How much does it cost to use AWS EC2 resources?
- Necessary steps for launching an instance
- Step 1. Choosing an AMI
- Step 2. Choosing an instance type
- Step 3. Configuring instance details
- Step 4. Adding storage
- Step 5. Tagging the instance
- Step 6. Configuring a security group
- Step 7. Reviewing the instance before launch
- Step 8. Assigning a key pair
- Step 9. Reviewing launch status
- Step 10. Examining a new instance in the ec2 console
- Step 11. Logging into an instance
- Trouble-shooting and advanced topics
- I can't login to EC2 instance - what might have gone wrong?
- How do storage volumes appear within a linux instance on amazon EC2?
- Taking stock of compute resources within an ubuntu linux instance
- Basic setup and administration of an ubuntu linux instance
- Setting up an Apache web server
- What is difference between the start, stop, reboot and terminate instance states?
- How do I create my own AMI, publish it as a Community AMI, and what is a snapshot?
- Tidying up and shutting down AWS resources
- Further reading and preparing for more advanced AWS cloud computing concepts
Citation for this resource:
Malachi Griffith*, Jason R. Walker, Nicholas C. Spies, Benjamin J. Ainscough, Obi L. Griffith*. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud. PLoS Comput Biol. 2015. Aug 6;11(8):e1004393. doi: 10.1371/journal.pcbi.1004393. eCollection 2015 Aug. PubMed PMID: 26248053.
The following existing threads also seem relevant to this topic:
Using Amazon Web Services?
How To Install & Use R/Bioconductor In Amazon Ec2
Making A Bioinformatics Application Available In The Amazon Ec2 Cloud
Amazon Ec2 Cloud Biolinux Performance Underwhelms
How Do You Use Cloud Computing For Bioinformatics In 2013?