How-to: Get started with Amazon EC2

Amazon cloud skills are in high demand. This easy, step-by-step guide will help start you on your path to cloud mastery.

If your company hasn't ventured an Amazon cloud deployment already, the day may be fast approaching. Amazon's pay-as-you-go cloud is no longer "just" a popular playground for developers, a magnet for technology startups, and the clandestine home of "shadow IT" projects. It's also increasingly a component of official IT operations.

Working with the Amazon EC2 cloud isn't especially difficult, but it is different. This quick guide will get you up and running and on your way to cloud mastery. When your company finally embarks on that Amazon deployment or the next stop in your career requires cloud skills, you'll be ready to answer the call.

Learning your way around Amazon

A first look at the Amazon Web Services dashboard confronts a bewildering array of services. Where to start? The truth is that a few of these resources will do almost everything you need. Others you may use little or not at all. The following services are the ones that will loom largest on your radar.

EC2 (Elastic Compute Cloud). EC2 instances are the servers on which you run your workload. Although you use a Web interface or API call to provision the servers and bring them into your collection, ultimately they are real computers with CPUs, memory, and access to physical storage.

S3 (Simple Storage Service). So-called simple storage, S3 is used for persistent and very cheap storage. S3 integrates with CloudFront, Amazon's content delivery solution. If you have website content such as graphics images and CSS, these files would typically be stored in S3 and fetched by your Web server at delivery time.

EBS (Elastic Block Storage). EBS is essentially a virtualized storage area network or SAN solution that all of your servers can share. Slice out chunks of storage for use by your instances as root or alternate volumes. You can then take snapshots of them to use for backups -- just as you would with Linux's LVM (Logical Volume Manager).

RDS (Relational Database Services). Amazon RDS is Amazon's managed relational database solution based on MySQL, Oracle, or SQL Server under the hood. When you launch a database instance, you choose the database engine you want.

ElastiCache. This is an Amazon-managed memcache solution. You can add and remove nodes easily, and with CloudWatch monitoring, you can have Amazon replace nodes for you if they fail.

Route 53. Route 53 is an Amazon-hosted DNS solution that allows you to associate names to your provisioned computing resources. Because instances in Amazon change their IP addresses whenever they are stopped and started again, reaching those boxes via names can be much more convenient and easier to support than relying on IP addresses.

VPC (Virtual Private Cloud). VPC is a superb addition to the Amazon portfolio of services, one that may very well benefit your enterprise. VPC essentially allows you to dynamically scale your existing data center using Amazon resources. Connecting the Amazon cloud with your data center via VPN, VPC allows your existing network to route Amazon instances privately, as though they were physical machines in your data center. Get all the benefits of the cloud with none of the security headaches.

There are of course many other Amazon services available, including email sending, message queueing, workflow, search, NoSQL, MapReduce, and alternative authentication solutions. But the above are the main services to understand.

In addition to these core services, you're sure to encounter a number of Amazon vocabulary terms again and again. Before you get started, it will pay to be familiar with the following concepts.

EC2 Instance. An instance is a unit of computing power, with CPUs, memory, and attached storage.

Amazon Machine Images. An Amazon Machine Image (AMI) is essentially a snapshot of a root volume. It may initially be difficult to wrap your head around this idea, but imagine the Linux Logical Volume Manager. Like LVM, an AMI allows you to snapshot your root volume and create a block-by-block copy of everything stored on the disk. That includes the master boot record, the kernel image, and so forth. The hypervisor layer in EC2 allows you to boot from these images on generic commodity servers in the Amazon data centers.

EBS Volumes. Volumes are snapshots or backups of volumes you have mounted on your server instances. In other words, EBS volumes persist independently of the instances themselves.

Security Groups. Amazon doesn't go with traditional perimeter security unless you're using the Virtual Private Cloud services. That means each server is its own universe, governed by security roles enforced by the hypervisor layer. This is real security, though the new paradigm may take some getting used to. Think of putting servers in groups by role, such as a database tier group, a Web server tier group, and so forth. You might even spin up a t1.micro instance and use it as a jump box. Make this instance the only machine in your environment with SSH access allowed, then grant access to all your servers' port 22 (for SSH) only from this jump box.

Load balancers. A load balancer in AWS becomes another facility that you can configure in a completely virtual way. Here's where you start to see the real power of the AWS environment. You can associate your instances to the load balancer by instance ID even if they are in different availability zones. You can configure the listener and cookie stickiness policies as well.

Availability Zones. Availability Zones are distinct data centers in the Amazon environment, but deployment is nevertheless transparent. All resources can be deployed easily whether on the East Coast, the West Coast, or the other side of the world. Storing mission-critical resources in multiple Availability Zones is your hedge against the inevitable Amazon outage.

Install the Amazon EC2 API Tools

Now that you're familiar with the core offerings and vocabulary, let's try out some of the services. You'll need to create an AWS account before we can go any further. Note that a free usage tier is available for new users.

First, we'll want to install the API tools. These Java-based tools allow you to issue Amazon commands from any terminal window, whether it be your local laptop, another server, or even an instance hosted in Amazon itself. Bootstrapping indeed!

The first step is to download the tools from Amazon. Next you'll set up a couple of environment variables:

export JAVA_HOME=/usr

export EC2_HOME=/home/sean/api-tools

These are examples of the commands for Linux and Unix. For more detail on these and for the corresponding commands on Windows, follow this link to Amazon's documentation.

Create your access keys

The Amazon dashboard provides an easy way to set up your keys.

1.Go to aws.amazon.com and log in.

2.Under your account name in the upper right, click the menu and select Security Credentials.

3.Click the first link, Access Credentials.

4.Click "Create new access key" and follow the instructions.

5.The last step will involve downloading two .pem files. Save these locally.

6.So that your Amazon tools can locate these .pem files, set these two environment variables:

export EC2_PRIVATE_KEY=/home/sean/keys/pk-A5X4ZTZRLDEMYVHGXCQHU2HW3HALFS3T.pem

export EC2_CERT=/home/sean/keys/cert-A5X4ZTZRLDEMYVHGXCQHU2HW3HALFS3T.pem

Choose an Availability Zone and Region

Availability Zones are distinct data centers. It is incredible that we can distill a data center down to a short identifier such as us-east-1a or us-west-1c, but that is the beauty of cloud computing and Amazon Web Services. As you build more complex applications with more resilient architecture, you'll pay more attention to which Availability Zone you deploy components in. For now, pick the one that's physically closest to your location.

You'll find the menu for selecting your Availability Zone right next to your account name in the upper-right corner of the EC2 dashboard.

Choose an Amazon Machine Image

Next stop on your Amazon tour is to decide which AMI to use. There are nearly 1,000 AMIs to choose from, and you can easily browse or search for what you need.

At this stage I wouldn't spend an inordinate amount of time deciding. Go with an Ubuntu image as a default. Also be sure to pick an EBS root AMI. There are very few use cases for Instance Store now that EBS is mature. I'm personally partial to Eric Hammond's images, which are well maintained, well supported, and well respected in the community.

A note on 32-bit versus 64-bit images: Only micro, small, and medium instances are available in 32-bit. As a general rule, it's best to go with 64-bit for everything unless you have a particular and compelling reason to require 32-bit. With 64-bit, your images will work on all instance types, and you can vertically scale easily.

Spin up your EC2 instance

You have your tools installed, you have your keys, you've picked an AMI and availability zone. Now you're finally ready to create a real Amazon instance. At the command line, enter:

$ ec2-run-instances ami-31814f58 -k my-keypair -t t1.micro -z us-east-1a

Notice I chose a micro instance. Micro instances are free, so they're a great option for trying out the tools.

Connect to your instance

Now that you have a running instance in EC2, you'll want to connect. Let's find out its name:

$ ec2-describe-instances

RESERVATION r-d1a71cc1046997127105 default

INSTANCE i-17086273ami-31814f58 ec2-64-21-210-168.compute-1.amazonaws.comip-10-44-61-104.ec2.internalrunning my-keypair0 t1.micro2012-06-15T13:11:05+0000us-east-1a aki-417d2539monitoring-disabled 64.21.210.168 10.46.63.204ebs paravirtualxen sg-65f4ec0adefault

BLOCKDEVICE /dev/sda1vol-3f1ac253 2012-06-15T13:11:32.000Z

Once you know the IP address to the box, go ahead and connect:

$ ssh -i my-keypair ec2-user@64.21.210.168

A few routine tasks

Folks familiar with Linux Volume Manager know that you can easily snapshot a disk volume. In Amazon, snapshots are a powerful facility for creating backups, protecting you from instance failure, and even creating new AMIs from your custom server setups. Look at the BLOCKDEVICE line above. You'll see the volume ID. That's all you need:

$ ec2-create-snapshot vol-3f1ac253

A few details to keep in mind: Although you can snapshot a running server, some tools will stop your instance in order to snapshot the root volume. This is for extra protection against corruption of the file system. If you're using a journaling file system such as ext3, ext4, or xfs, snapshotting a running system will leave your volume in a state similar to a crashed server. Upon startup, incomplete blocks will be repaired. In the case of a database mount such as MySQL, however, you should issue these additional commands from the MySQL shell:

mysql > flush tables with read lock;

mysql > system xfs_freeze -f /data

For an in-depth explanation of how to do this, see article, "Autoscaling MySQL on Amazon EC2."

When instances are started, Amazon automatically assigns a new IP address to them. Dynamic addresses are fine for playing around, but you'll undoubtedly want static, global IP addresses for some machines eventually. That's where elastic IP addresses enter the picture; your AWS account comes with a number of these. You can set your new instance with one of these static IPs using a simple command-line call:

$ ec2-associate-address 10.20.30.40 -i i-17086273

You're all set.

Now that you've had a taste of Amazon, you'll want to explore more. With the command-line tools installed and your security keys set up, you have everything you need to go further -- and get comfortable with different instance types, various AMIs, the Availability Zones your instances and volumes are stored in, how load balancers work, and beyond. The further you go, the more you'll appreciate that Amazon's documentation is as copious as its services.