Tuesday, 26 January 2016

Use Vagrant to set up a Centos 7 VM in AWS EC2


Everyone tells us that Infrastructure as Code is the way to go, right? So when I was recently asked to set up a continuous integration service for a development project, the obvious option was an approach that allows us to script the server setup and deployment, and put the whole lot under version control. Particularly since a colleague had already assembled a Vagrant file that allowed him to deploy a Jenkins server in a VM on his workstation.

However, there were a number of gotchas, which caused me a couple of days' unexpected work. I'm trying to record these here in case anyone else benefits from my experience.

One of the requirements was to run the CI server under CentOS 7, the same as the target environment, so that all tests would run under near-identical production-environment conditions and deployment would be very simple using a tar file containing all the dependencies. But CentOS comes with enterprise-grade security features, which sometimes get in the way of what you want to do. Read on...

Set up Vagrant

To use Vagrant, set up a folder in your project (e.g. called "CI") and in it, create a Vagrantfile. If you have not used Vagrant before, please work through the quick Getting Started exercise to familiarise yourself with the concepts.

The Vagrantfile is basically a Ruby script, so it is common practice to prefix it with:

# -*- mode: ruby -*-
# vi: set ft=ruby :

To use Vagrant with AWS, first you have to locally install the provider and the AWS box:

vagrant plugin install vagrant-aws
vagrant box add dummy \

The README for the Vagrant AWS provider helpfully provides a starter Vagrantfile for you to copy and extend. However, we need to add a number of parameters to the AWS provider configuration and change all the ones supplied.

aws.access_key_id = "**"
aws.secret_access_key = "****"
aws.session_token = "***"
aws.keypair_name = "*****"

aws.ami = "ami-e68f82fb" # CentOS 7 64-bit
aws.instance_type = "t2.medium"
aws.region = "eu-central-1" # Pick the appropriate region
aws.security_groups = [ 'sg-e430808d' ]
aws.block_device_mapping = [{ 'DeviceName' => '/dev/sda1', 'Ebs.VolumeSize' => 50 }]
aws.associate_public_ip = true
aws.subnet_id = "subnet-f7569b8c"
aws.ssh_host_attribute = :dns_name
aws.tags = { 'NAME' => 'Continuous Integration' }

aws.user_data = File.read ("boothook.sh")
override.ssh.username = "centos"
override.ssh.pty = true
override.ssh.private_key_path = "~/.ec2/*****.pem"

The private key file for your instance (or cluster) will be generated for you by AWS EC2 when you launch an instance through the EC2 management console. The recommendation is to put this in ~/.ec2 along with any other EC2 private keys. Do not share it via version control, or it soon won't be secret any more.

The aws.tags hash allows you to set any tag values. NAME is a typical example, which allows you to identify the instance in the EC2 management console.

The aws.subnet_id should be set the same as for any EC2 instance launched manually in the same subnet.

aws.block_device_mapping is only needed if you want to allocate a non-default volume size. For CI purposes, the default volume size is probably too small if any significant amount of build history is to be kept.

The aws.security-groups should list one or more security groups that you have created via the EC2 management console. Make sure that at least the SSH port and HTTP(s) are permitted inbound. I set up an nginx server as a reverse proxy on port 80 (see below) to allow client browsers to access multiple back-end services through the standard HTTP port.

The aws.instance_type allows you to choose the size of virtual machine. You may find that t2.small is sufficient for your needs, but if it runs out of CPU credits, it will be throttled severely (which can actually cause builds to fail due to timeouts). While trying to perfect the Vagrantfile, however, you may wish to specify t2.micro to minimise AWS usage charges.

Notice the "boothook.sh" reference. This is based on the answer to a commonly encountered issue with Vagrant and certain AWS AMIs. The contents of the file are:

# CentOS 7 normally requires a TTY for sudo,
# which kills Vagrant's rsync command.
# By loading this sequence of commands into
# aws.user_data, that problem is defeated.
echo "Defaults:centos !requiretty" > $SUDOERS_FILE
echo "Defaults:root !requiretty" >> $SUDOERS_FILE
chmod 440 $SUDOERS_FILE

The Vagrantfile has a number of aws access parameters that need to be configured (shown by asterisks above). Insert the name of your private key, which must be the one contained in the override.ssh.private_key_path parameter. Navigate to the Identity and Access Management section of the AWS console. Create an IAM role for your Vagrant execution and generate an access key. Then grant that user the AmazonEC2FullAccess permission. This allows Vagrant to provision the virtual machine.

Confusingly, this role and its keys are not inserted into the Vagrantfile at all. Instead, a session token is required. This is how to obtain it:

  1. Download and install the Amazon Command Line Interface. On Mac OS X with Python and pip already installed, this turned out to consist simply of a one-line command:
    sudo pip install awscli
  2. Configure the command line interface:
    aws configure
    (Leave the default region name and default output format as "none")
  3. Request the session token with a duration of 36 hours (maximum):
    aws sts get-session-token --duration-seconds 129600

Update the Vagrantfile using the session token as well as the new key and secret key returned. After 36 hours, if you want to deploy using this Vagrantfile again, you will have to repeat the procedure.

Following the AWS provider configuration in the Vagrantfile (just above the final "end" statement in the file), specify any further configuration steps required, e.g. synchronized folders and custom software installations (see next section).

After this, deploying the box should be straightforward (make sure you are in the same current working directory as the Vagrantfile):

vagrant up --provider=aws

Note the public IP and host FQDN shown for the virtual machine in the AWS console. This is the address you will need to access your CI application from a browser. For example, if your machine FQDN is ec2-54-93-105-248.eu-central-1.compute.amazonaws.com, your Jenkins dashboard (assuming you went on to install Jenkins, as shown below) will be at http://ec2-54-93-105-248.eu-central-1.compute.amazonaws.com/jenkins/.

Continue configuring your CI server manually and add these configuration steps to the Vagrantfile if possible.

To terminate the machine, use

vagrant destroy

Synchronize Folders

Between the end of the config.vm.provider configuration and the end of the Vagrantfile, you can insert further configuration instructions. Configure SSH for folder synchronization using the same parameters as above:

config.ssh.username = "centos"
config.ssh.pty = true
config.ssh.private_key_path = "~/.ec2/*****.pem"

Then specify which folders you want to synchronize to the server. Because folder synchronization precedes any shell scripts run on the target VM, I find it best to synchronize mostly to /tmp subfolders on the target VM and then copy or move the contents from there during the subsequent software installation. For example:

config.vm.synced_folder "./nginx", "/tmp/nginx", \
type: "rsync", create: true, owner: 'root', group: 'root'

where the local folder "nginx" contains a subfolder "default.d", which in turn contains "jenkins.conf" to specify the reverse proxy configuration for Nginx to access the Jenkins server on port 8080 (see below). The copied folder "default.d" is subsequently moved from /tmp/nginx to /etc/nginx once the Nginx software has been installed.

Install Software


So-called "here documents" are a neat way to separate bits of installation script into identifiable blocks that can be invoked from the configuration section. However, there is a "gotcha" here too - any backslashes ("\") must be escaped ("\\"). This caught me out several times when developing sed or awk scripts in a shell and pasting them into the Vagrantfile. (In the pieces of Vagrantfile shown in this blog post, please interpret a single backslash at the end of a line to mean a soft line wrap. Join with the following line and delete the backslash after copying! And don't insert any spaces, particularly in the middle of sed or awk scripts!)

Place your here documents one after the other directly above the Vagrant.configure(2) block.

Set up tools

The first thing is to install some tools that will be used by the subsequent installations. The time and date should of course be set to whatever is appropriate for you.

Here Document

sudo yum -y update
sudo yum -y groupinstall 'Development Tools'
sudo yum -y install epel-release
sudo yum -y install nano byobu bzip2 wget
sudo yum -y install fontforge # Not required for production
sudo timedatectl set-timezone Europe/London


config.vm.provision "shell", inline: BaseBox

Set up Nginx

Note the copy command, which makes use of the folder synchronisation shown earlier.

The setsebool command is required to allow Nginx to proxy HTTP or HTTPs to local TCP sockets.

Here Document

sudo yum -y install epel-release
sudo yum -y install nginx
sudo cp -r /tmp/nginx/default.d/*.conf /etc/nginx/default.d/
sudo setsebool httpd_can_network_connect 1 -P
sudo systemctl enable nginx
sudo systemctl restart nginx


config.vm.provision "shell", inline: Nginx

Set up Jenkins

This is slightly complicated by the fact that we need Jenkins to have a URL prefix - otherwise reverse-proxying becomes next to impossible (see the sed script below). The jobs folder is relocated to /home/jenkins and symbolically linked, which should make it easier to upgrade Jenkins later without losing build configurations and histories. The initial set of build configurations is stored in version control and synchronised to /tmp/jenkins/jobs by Vagrant before this installation occurs.

The installation of plugins is separated into a second block. You may of course require a different selection. The best way I have found to determine the name of the plugins to install is to install them manually once, while using the "list-plugins" command before and afterwards to find which new plugin names have appeared. You can do this after the Vagrant machine has been deployed by means of the following commands:

vagrant ssh

wget http://localhost:8080/jenkins/jnlpJars/jenkins-cli.jar
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \

NB if you have already enabled security on the Jenkins instance, you must log in to the Jenkins CLI before you can list the plugins.

java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
--username **** --password ****

Here Document

# Install Java 8 update 65 because there's a bug
# in the Jenkins update module that makes
# signature checks fail in Java 8 > u65
sudo yum -y install java-1.8.0-openjdk-
sudo curl -sLo /etc/yum.repos.d/jenkins.repo \
sudo rpm --import \
sudo yum -y install jenkins
sudo sed -i 's/^\\(PARAMS=.\\)/\\1--prefix=\\/jenkins /' \
# Enable Jenkins to upgrade itself automatically
sudo chgrp -R jenkins /usr/lib/jenkins
sudo chmod -R g+w /usr/lib/jenkins
sudo systemctl daemon-reload
sudo systemctl enable jenkins.service
sudo systemctl restart jenkins.service
sudo mv /tmp/jenkins /home/jenkins
sudo chown -R jenkins:jenkins /home/jenkins
sudo ln -s /home/jenkins/jobs /var/lib/jenkins/jobs
sudo chown jenkins:jenkins /var/lib/jenkins/jobs

wget http://localhost:8080/jenkins/jnlpJars/jenkins-cli.jar
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
     install-plugin authentication-tokens
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
     install-plugin copyartifact
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
     install-plugin ghprb
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
     install-plugin git
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
     install-plugin git-changelog
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
     install-plugin git-client
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
     install-plugin git-parameter
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
     install-plugin git-tag-message
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
     install-plugin github
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
     install-plugin github-api
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
     install-plugin github-pullrequest
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \
     install-plugin nodejs
java -jar jenkins-cli.jar -s http://localhost:8080/jenkins/ \


Insert some other software installations between these two in order to allow Jenkins to initialise itself before calling it via the CLI.

config.vm.provision "shell", inline: Jenkins
config.vm.provision "shell", inline: JenkinsPlugins

Set up NodeJS

The installation of NodeJS (or node.js) under CentOS 7 is fairly straightforward, but I needed to specify the exact versions of node, grunt and bower in order to comply with the technical policy of the project. If you don't need to do that, just omit the version details (e.g. sudo yum install -y nodejs).

Here Document

curl -sL https://rpm.nodesource.com/setup_4.x | sudo -E bash -
sudo yum install -y nodejs-4.2.3-1nodesource.el7.centos.x86_64
sudo npm install -g grunt-cli@0.1.13 bower@1.7.0


config.vm.provision "shell", inline: Node423

Set up PostgreSQL

Here again an exact version was needed, otherwise installation could have been much more straightforward. Note the use of double-backslashes in the awk and sed scripts.

Here Document

sudo cp /etc/yum.repos.d/CentOS-Base.repo /tmp
sudo awk '{print}; $1 ~/\\[base\\]/ || $1 ~/\\[updates\\]/'\
' {print "exclude=postgresql*"}' /tmp/CentOS-Base.repo \
> /etc/yum.repos.d/CentOS-Base.repo
sudo yum -y localinstall \
sudo yum -y install postgresql95-server postgresql95-contrib
sudo /usr/pgsql-9.5/bin/postgresql95-setup initdb
sudo sed -i \
"s/#listen_addresses = 'localhost'/listen_addresses = '*'/" \
sudo sed -i \
's/^host [^:]*$/host    all             all'\
'             all                     md5/' \
sudo systemctl restart postgresql-9.5.service
sudo systemctl enable postgresql-9.5.service
sudo su -c "createdb mydatabase" - postgres
sudo su -c "createdb mydatabase_test" - postgres
cat <<EOF | sudo su -c "psql mydatabase" - postgres
CREATE USER myapp_test WITH PASSWORD 'myapp';
GRANT USAGE ON SCHEMA public TO myapp_test;


config.vm.provision "shell", inline: PostgreSQL

Set up Redis

Here Document

sudo yum -y install redis
sudo systemctl enable redis.service
sudo systemctl restart redis.service


config.vm.provision "shell", inline: Redis


I hope this has helped you in your quest for DevOps Nirvana. Please drop me a line to tell me about your experiences. I can't promise to help, but I'll lend a sympathetic ear!