Cloudformation for Poets

  • Sunday, Feb 20, 2022
blog-image

CloudFormation is AWS’s Infrastructure as Code service. Most engineers prefer Terraform (and increasingly, Pulumi) for its much more concise format, and flexibility. However, CloudFormation templates (which I will mercifully abbreviate to “CFN” from now on) have a few very compelling advantages:

  • One-Click deployment - Yes, you can setup deployment of your infra stack from a single web button. The CFN console will give you the link right on the console, or you can build your own.
  • Not your problem - AWS commits to handling state data, storing the templates, and dealing with multiple users doing deploys. If you are pushing out a lot of infra with a lot of moving parts, not dealing with Terraform’s shared state, and potential concurrency issues is a blessing in some use cases.
  • Trivial handoff - Handing off CFN to a customer new in the IaC space goes something like: “Go to the CloudFormation console, click ‘Create Stack’, and upload this file. Make sensible choices for the parameters”. The same for Terraform generally involves handing them off a full CI/CD system, building out a deploy container/instance, or at least walking them through the process of installing everything.

However, developing these stacks can quickly turn to pain if you’re not ready for it, so let’s go over your workflow and optimize it. In this article I want to help the template designer quickly design and refine their template so you don’t spend a lot of time staring at the console. Here’s what we’re going to cover:

  1. Build it, then code it
  2. Configure VSCode for CFN
  3. Deploy Minimally at first
  4. Use VSCode tasks to speed up deployment
  5. Embrace Failure
  6. Misc tips

Build it, then code it

Do you have a starting template that gets you some of the way to your goal? No? ok, then under no circumstances should you fire up your text editors. Get it to work in the console and manually before you screw around with writing it down.

This is not to say “don’t use a template”. Not at all. There are many templates published by AWS and very good 3rd parties that will get you started up with the basics of what you need and they generally have a lot of learned info that you wouldn’t think of in the first pass…. but you should start with that, and not a blank page if what you’re doing is new to you.

There is nothing slower that trying to get the syntax exactly right for a startup command on an instance and continually re-deploying to get it right. I would propose that you instead, spin up an instance, iteratively write the script to do it on said instance, and then copy that into your template when ready.

Setup VisualStudio Code For Editing CloudFormation

For optimal experience you’ll need to do the following:

The settings you need to add to support AWS’s extended YAML format are:

{
    // Custom tags for the parser to use
    "yaml.customTags": [
        "!And sequence",
        "!If sequence",
        "!Not sequence",
        "!Equals sequence",
        "!Or sequence",
        "!FindInMap sequence",
        "!Base64 mapping",
        "!Cidr sequence",
        "!Ref",
        "!Sub sequence",
        "!Sub",
        "!GetAtt sequence",
        "!GetAtt",
        "!GetAZs",
        "!ImportValue",
        "!Select",
        "!Select sequence",
        "!Split sequence",
        "!Join sequence",
        "!And",
        "!If",
        "!Not",
        "!Equals",
        "!Or",
        "!FindInMap",
        "!Base64",
        "!Join",
        "!Cidr",
        "!ImportValue sequence",
        "!Split"
    ],
    // Enable/disable default YAML formatter (requires restart)
    "yaml.format.enable": true
}

Paste those into an appropriate place in your settings.json file, and save.

Deploy Minimally

CFN has a… curious design choice that if you deploy a stack that fails, it puts that stack into a state where you cannot replace or update it. You must delete to do anything with it. This means that to avoid a “Deploy, fail, rollback, cry, delete” loop, you need to get a version running. So deploy a minimal config. Either of the following will work:

  • Your base template
  • A single, nearly impossible to fail, resource

So what does a minimal CFN template look like? Here ya go:

AWSTemplateFormatVersion: 2010-09-09
Description: (WIP) Minimal Template

Resources:
    NullResource:
        Type: AWS::CloudFormation::WaitConditionHandle
        Properties: {}

Once created and deployed, all future deployments can be updates or deploys, and won’t require a delete if it fails to deploy.

Use VSCode Tasks

We can add a tasks.json file to our VSCode workspace to allow us to easily deploy our templates right from VS Code:

{
    // See https://go.microsoft.com/fwlink/?LinkId=733558
    // for the documentation about the tasks.json format
    "version": "2.0.0",
    "tasks": [
        {
            "label": "validate-template",
            "type": "shell",
            "command": "aws cloudformation validate-template --template-body file://${file}",
            "group": {
                "kind": "test",
                "isDefault": true
            },
            "problemMatcher": [],
            "presentation": {
                "panel": "new"
            }
        },
        {
            "label": "deploy",
            "type": "shell",
            "command": "aws cloudformation deploy --template-file ${file} --stack-name ${fileBasenameNoExtension} --parameter-overrides file://./parameters-testing.json --disable-rollback --capabilities CAPABILITY_IAM",
            "group": {
                "kind": "build",
                "isDefault": true
            },
            "problemMatcher": [],
            "presentation": {
                "panel": "new"
            }
        },
        {
            "label": "delete-stack",
            "type": "shell",
            "command": "aws cloudformation delete-stack --stack-name ${fileBasenameNoExtension}",
            "problemMatcher": [],
            "presentation": {
                "panel": "new",
                "reveal": "never"
            }
        }
    ]
}

To fully utilize this, we’ll need to do a few things first:

  • Make sure you’re running in an AWS security context that allows you to run the stack. This could mean making sure your default user can deploy, using aws-vault or awsume to set your active session before starting VSCode, or setting a --profile option on the above tasks.
  • The tasks deploy (by default) to a stack that has the same name as the template file you’re editing when deploying.
  • You have created a parameters-testing.json file in the same folder with all the deployment parameters you want to supply

Making the parameters file is a matter of converting the Parameters required by your template to a very wordy version

[
  {
    "ParameterKey": "param1",
    "ParameterValue": "value1"
  },
  {
    "ParameterKey": "param2",
    "ParameterValue": "value2"
  },
]

This is kind of a pain, so here’s a utility to help you:

Embrace Failure

Because deployments provide feedback slowly, this means they fail slowly, which is frustrating for most of us. It’s OK. This happens and there will be delays… BUT that doesn’t mean we can’t take steps to work through it.

AWS’s relatively new disable rollback feature makes the deployment stop at the point of failure and allow you to try again without a full rollback/redeploy cycle. You can take this halted state and use it to manually fix issues in your startup files, fix template errors, or update parameters to something that is allowed (and edit your validation conditions for those parameters, right?)

Finally, make sure you make the best out of every broken deployment. Fix all the issues you can, and port those fixes to your templates.

My biggest issues when developing templates is getting the startup scripts to work. Usually, when performing a deployment where you’re heavily depending on user-data, you should be setting a wait condition on your stack, so you can signal back when something has happened, and relay any info. These wait handles should not have huge timeouts, as these will hang deployments for far too long in the event of a failure. Make sure your scripts fail fast and signal that failure up the stack quickly.

Misc Tips

In no particular order:

  • Need some starter templates? Here are some places to get validated templates:

  • CFN-Lint Is a formatting and syntax checker for Cloudformation files with a few neat features. Needs pydot installed to make pretty diagrams, but otherwise a nice way to sanity test your templates as you go.

  • CloudFormation-Guard is a static security analyzer for templates that checks for common issues, and provides a language tailored to defining rules for checking CFN templates.

  • Got a bunch of stacks that need to work together? As an alternative to using references from other stacks, create SSM parameters of shared parameters, and reference them by using the {{resolve:ssm:/parameter/name}} annotation. AWS Documentation has more info, but a quick example is:

    MyIAMUser:
      Type: AWS::IAM::User
      Properties:
        # uses the latest version of the IAMUserName SSM Parameter
        UserName: '{{resolve:ssm:IAMUserName}}'
        LoginProfile:
          # uses version 10 of the IAMUserPassword, and does not store it with the template parameters
          Password: '{{resolve:ssm-secure:IAMUserPassword:10}}'
    
  • Most instances these days use cloud-init to handle startup scripts. This adds a lot of flexibility for quickly starting up fully configured machines. You can also use CloudFormations cfn-init suite to do many of the same things. These provide signalling to CloudFormation that things are done and if they succeeded as well as letting you send messages back.

    Resources:
      SomeWaitHandle:
        Type: AWS::CloudFormation::WaitHandleCondition
    
      SomeWaitCondition:
        Type: AWS::CloudFormation::WaitCondition
        Properties:
          Handle: !Ref SomeWaitHandle
          Count: 1
          Timeout: 600
    
      SomeServer:
        Type: AWS::EC2::Instance
        Metadata:
          AWS::CloudFormation::Init:
            config:
              files:
                /etc/cron.daily/reboot:
                  content: |
                    #!/bin/bash -e
                    (sleep 120 && reboot > /dev/null 2>&1) &                
                  mode: "000755"
                  owner: root
                  group: root
              commands:
                InstallJq:
                  command: yum install jq -y
        Properties:
          SecurityGroupIds:
            - !Ref SomeSecurityGroup
          ImageId: !FindInMap
            - AWSRegionMap
            - !Ref 'AWS::Region'
            - AMI
          SubnetId: '{{resolve:ssm:/vpcInfo/privateSubnet1}}'
          InstanceType: !Ref 'SomeInstanceType'
          IamInstanceProfile: !Ref 'SomeInstanceProfile'
          KeyName: !Ref 'KeyName'
          BlockDeviceMappings:
            - DeviceName: /dev/xvda
              Ebs:
                VolumeType: gp3
                DeleteOnTermination: true
                VolumeSize: 35
                Encrypted: true
          # UserData is presented as a simple bash script. You might also consider a cloud-init script
          UserData:
            Fn::Base64: !Sub
              - |
                #!/bin/bash
                # protect us from ourselves
                set -euo pipefail
    
                # redirect I/O
                exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
    
                # Do any initial setup here
                setup_command
    
                # Pass control to cfn-init, which will run do the operations listed in the AWS::CloudFormation::Init
                /opt/aws/bin/cfn-init -v --stack ${AWS::StackName} --resource ${SomeServer} --region ${AWS::Region}
    
                # Signal we're done. '-e 0' means success, and the '-d' gives it a message to report
                # WHUrl is the wait handle URL to signal
                /opt/aws/bin/cfn-signal -e 0 -d "All Set" ${WHUrl}            
              - WHUrl: SomeWaitHandle
          Tags:
            - Key: Name
              Value: !Ref SomeName
    
      SomeOtherServer:
        Type: AWS::EC2::Instance
        DependsOn: SomeWaitCondition
        Properties:
          SecurityGroupIds:
            - !Ref SomeSecurityGroup
          ImageId: !FindInMap
            - AWSRegionMap
            - !Ref 'AWS::Region'
            - AMI
          SubnetId: '{{resolve:ssm:/vpcInfo/privateSubnet1}}'
          InstanceType: !Ref 'SomeInstanceType'
          IamInstanceProfile: !Ref 'SomeInstanceProfile'
          KeyName: !Ref 'KeyName'
          BlockDeviceMappings:
            - DeviceName: /dev/xvda
              Ebs:
                VolumeType: gp3
                DeleteOnTermination: true
                VolumeSize: 35
                Encrypted: true
          # UserData is presented as a simple bash script. You might also consider a cloud-init script
          UserData:
            Fn::Base64: |
                #!/bin/bash
                # protect us from ourselves
                set -euo pipefail
    
                # redirect I/O
                exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
    
                # Do any initial setup here
                setup_command            
          Tags:
            - Key: Name
              Value: !Ref SomeOtherName
    

Photo by Markus Spiske on Unsplash