Intro
On almost every cloud provider today, you can run a script on instance startup that allows you to do whatever setup tasks you need. Most cloud providers have transitioned to using Ubuntu’s cloud init package for instance startup management, at least on some of their images. This package will do basic things like setup ssh keys, mount drives, expand root disks, and the normal tasks required to make a VM appear out of nowhere.
What you may not know is that by passing it a specially formatted YAML file with a specific header, it will unleash the full capabilities of cloud-init. Start your user-data block with #cloud-config
and you are good to go.
So what can you do with this? Let’s start with setting up typical application
First add our header, then tell it to update the system image:
#cloud-config
package_update: true
package_upgrade: true
Then we’ll add packages required to operate our application:
packages:
- iptables-services
- tc
Next, drop a couple of config files so this all starts up correctly:
write_files:
- owner: root:root
path: /usr/local/bin/trafficShape.sh
permissions: "0755"
content: |
#!/bin/bash
INTERFACE=eth0
LIMIT=$(printf "%.0fkbit" "$((10**3 * ${MaxBandwidth}))")
LIMIT20=$(printf "%.0fkbit" "$((10**6 * ${MaxBandwidth} * 2))e-4")
LIMIT40=$(printf "%.0fkbit" "$((10**6 * ${MaxBandwidth} * 4))e-4")
LIMIT80=$(printf "%.0fkbit" "$((10**6 * ${MaxBandwidth} * 8))e-4")
tc qdisc add dev $INTERFACE root handle 1: htb default 12
tc class add dev $INTERFACE parent 1: classid 1:1 htb rate $LIMIT ceil $LIMIT burst 10k
tc class add dev $INTERFACE parent 1:1 classid 1:10 htb rate $LIMIT20 ceil $LIMIT40 prio 1 burst 10k
tc class add dev $INTERFACE parent 1:1 classid 1:12 htb rate $LIMIT80 ceil $LIMIT prio 2
tc filter add dev $INTERFACE protocol ip parent 1:0 prio 1 u32 match ip protocol 0x11 0xff flowid 1:10
tc qdisc add dev $INTERFACE parent 1:10 handle 20: sfq perturb 10
tc qdisc add dev $INTERFACE parent 1:12 handle 30: sfq perturb 10
- owner: root:root
path: /etc/sysctl.d/85-ip-forward.conf
permissions: "0644"
content: |
net.ipv4.ip_forward = 1
- owner: root:root
path: /etc/systemd/system/traffic_shaping.service
permissions: "0644"
content: |
[Unit]
Description=Enable traffic shaping queueing
After=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/trafficShape.sh
And finally, we’ll run a few commands to make sure everything is installed and set to auto start
runcmd:
- |
sysctl -w net.ipv4.ip_forward=1
INTERFACE=eth0
iptables -t nat -A POSTROUTING -s ${VpcCIDR} -o $INTERFACE -j MASQUERADE
/usr/local/bin/trafficShape.sh
# Set high-priority class and relevant protocols which uses it
iptables -t mangle -A POSTROUTING -o $INTERFACE -p tcp -m tos --tos Minimize-Delay -j CLASSIFY --set-class 1:10
iptables -t mangle -A POSTROUTING -o $INTERFACE -p icmp -j CLASSIFY --set-class 1:10
iptables -t mangle -A POSTROUTING -o $INTERFACE -p tcp --sport 53 -j CLASSIFY --set-class 1:10
iptables -t mangle -A POSTROUTING -o $INTERFACE -p tcp --dport 53 -j CLASSIFY --set-class 1:10
iptables -t mangle -A POSTROUTING -o $INTERFACE -p tcp --sport 123 -j CLASSIFY --set-class 1:10
iptables -t mangle -A POSTROUTING -o $INTERFACE -p tcp --dport 123 -j CLASSIFY --set-class 1:10
iptables -t mangle -A POSTROUTING -o $INTERFACE -p tcp --sport 22 -j CLASSIFY --set-class 1:10
iptables -t mangle -A POSTROUTING -o $INTERFACE -p tcp --dport 22 -j CLASSIFY --set-class 1:10
# a couple more tables to tag short, TCP signalling packets correctly
iptables -t mangle -N ack
iptables -t mangle -A ack -m tos ! --tos Normal-Service -j RETURN
iptables -t mangle -A ack -p tcp -m length --length 0:128 -j TOS --set-tos Minimize-Delay
iptables -t mangle -A ack -p tcp -m length --length 128: -j TOS --set-tos Maximize-Throughput
iptables -t mangle -A ack -j RETURN
iptables -t mangle -A POSTROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK ACK -j ack
iptables -t mangle -N tosfix
iptables -t mangle -A tosfix -p tcp -m length --length 0:512 -j RETURN
iptables -t mangle -A tosfix -j TOS --set-tos Maximize-Throughput
iptables -t mangle -A tosfix -j RETURN
service iptables save
systemctl daemon-reload
systemctl enable iptables.service
systemctl enable traffic_shaping.service
And now, when you add this to the instance’s user-data
field and boot it, you’ll get a fully functioning NAT Gateway with traffic shaping in less than 20 seconds.
Now, this just scratches the surface. Cloud-init has a rich module lineup that can do quite a lot of operations that would annoying to script with just bash.
Contraindications
If you need any of the following, you should probably pre-bake your system image:
- If your application setup takes longer than your startup budget. If you need new instances on line in about a minute and it takes 10 minutes to install the app, well, this just won’t work.
- If the setup of the instance involves downloading a lot of data to do the install, and gets deployed a lot, it may just be cheaper to bake it all into an image that you won’t get charged for downloading.
- If the setup process is so complicated or finicky, you might be better off building off-line and getting alerts when things break, rather than getting a surprise in prod. You also have other problems, but we’ll ignore those for now.
Debugging
so this is all well and good, but like anything you built, you probably built yourself a bunch of problems. So, let’s figure out how to debug things.
First, the logs are in generally in where your logs are kept. /var/logs
on most distros today, and C:\Logs
on Windows (Note: No major cloud provider uses cloud-init
on Windows). Output from your scripts is in cloud-init-output.log
file, and general output from cloud-init is in cloud-init.log
,
Once you have some idea of what broke from looking over those files, you can go to the scripts folder in /var/lib/cloud/instance/scripts
to see what was actually run. You can directly run that script as root, or re-run that part of the cloud-init process with the “single” command. Here’s an example to re-run your scripts:
cloud-init single --name cc_runcmd --frequency always
To find the module names, look at the documentation at https://cloudinit.readthedocs.io/
Tips
Let’s take a look at a few things you can do to make developing the startup script easier.
Fail Fast
Since while writing your startup scripts they will fail often (Or maybe not. Maybe you’re actually god at this?), you should add the following to your script so that things fail out immediately and you don’t continue the script, causing all kinds of problems: set -eo pipefail
This tells bash to exit on any failure. You can add an ‘x’ to the mix and it will output each command before execution for even more visibility into what’s going on
Learn to pronounce Idempotent
Next, try to be as idempotent as possible. Basically, your code should be able to run multiple times and have the same results. For instance, if I run this code several times, I’m going to get multiple entries into the hosts file.
#!/bin/bash
set -eo pipefail
PRIMARY=$(aws ssm get-parameters --with-decryption --names "cluster-primary-ip" \
--query 'Parameters[*].Value' --output text)
echo "${PRIMARY} node0" >> /etc/hosts
not great. Instead, I should make sure that I either don’t do the thing if it’s already there:
#!/bin/bash
set -eo pipefail
PRIMARY=$(aws ssm get-parameters --with-decryption --names "cluster-primary-ip" \
--query 'Parameters[*].Value' --output text)
grep 'node0' /etc/hosts || echo "${PRIMARY} node0" >> /etc/hosts
Or, I should rebuild without the line I’m adding
#!/bin/bash
set -eo pipefail
PRIMARY=$(aws ssm get-parameters --with-decryption --names "cluster-primary-ip" \
--query 'Parameters[*].Value' --output text)
cp /etc/hosts /tmp/hosts
egrep -v 'node0$' /tmp/hosts > /etc/hosts
echo "${PRIMARY} node0" >> /etc/hosts
If you choose to skip steps when you see they have already been done, you will likely make subsequent debugging runs go much faster.
Work it like you stole it
Finally, before you get a fully working deployment, don’t just terminate and try again for your next revision. Edit and re-run your scripts as many times as it takes so you can fix multiple error per spin-up cycle, rather than one. Most of the time, you can comment out the parts of the script that did run well, and execute the remainder to see how it goes.
You can test small sections at a time by using exit 0
in your bash scripts to stop running, and then remove or move it before the next rerun.