Hassy Veldstra

Building Artillery.io • Interested in OSS, SRE, product design, SaaS • always up for coffee ☕ • h@veldstra.org • @hveldstra

Meet Chaos Llama

I’m excited to release v1 of Chaos Llama, my latest open-source project under the banner of Shoreditch Ops.

Star

                 V
                /'>>>
               /*/  _____ _____ _____ _____ _____
              / /  |     |  |  |  _  |     |   __|
             /*/   |   --|     |     |  |  |__   |
            / /    |_____|__|__|__|__|_____|_____|
    -------/*/      __    __    _____ _____ _____
 --/  *  * */      |  |  |  |  |  _  |     |  _  |
  /* * *  */       |  |__|  |__|     | | | |     |
  -  --- -/        |_____|_____|__|__|_|_|_|__|__|
   H    H
   H    H
   --   --

What is Chaos Llama?

Chaos Llama is a small utility for testing the relisience of AWS architectures to random failures.

Once deployed, Chaos Llama will pick and terminate instances at random at some configurable interval. The idea is to constantly test your system’s ability to keep running despite partial failure of some components making the system more resilient to outages overall.

If this sounds familiar, that’s because Chaos Llama is inspired by Netflix’s notorious Chaos Monkey. The main difference between Chaos Monkey and Chaos Llama is simplicity. Whereas Chaos Monkey requires an EC2 instance to be created, configured and maintained to run, Chaos Llama takes advantage of AWS Lambda and can be installed and deployed in a matter of minutes. The flipside of that is that Chaos Llama has a smaller feature set and only runs on AWS.

How Chaos Llama Works

There are two parts to Chaos Llama: the CLI that lets you deploy and configure Llama, and the AWS Lambda function which picks and terminates an instance when it’s run.

The CLI The llama-cli package is a Node.js CLI application (using the awesome yargs library) that uses the AWS Node.js SDK to create and update the lambda function and to create an invokation schedule for it with Cloud Watch Events.
The lambda function The lambda function that contains the logic for selecting and terminating EC2 instances.

Getting Started With Chaos Llama

Installation

Install the CLI with:

npm install -g llama-cli

(If you don’t have Node.js/npm installed, grab an installer for your platform from nodejs.org.)

AWS Config

To deploy Llama, you’ll need an IAM User (for the CLI to run as) and an IAM Role (for the lambda).

Sset up an IAM user (if you don’t have one already):

Log into the AWS Console
Navigate to IAM -> Users -> Create New Users
- Name the new user something like chaos_llama
- Copy the Access Key ID and Secret Access Key into ~/.aws/credentials:
```
 [llama]
 aws_access_key_id=YOUR_KEY_ID_HERE
 aws_secret_access_key=YOUR_ACCESS_KEY_HERE
```

In the list of users click on llama_cli and go to the Permissions tab and attach these policies:
- AmazonLambdaFullAccess
These will allow the Llama CLI to create the lambda.

Then, create a Role for Llama’s lambda function:

Finally, navigate to Roles -> Create New Role
Name the new role something like chaos_llama
Select ‘EC2’ under ‘AWS Service Roles’
Select AmazonEC2FullAccess in the list of policies
Take note of the Role ARN somewhere

Deploy the llama

Once the IAM User is set up and your have the role ARN, run:

AWS_PROFILE=llama llama deploy -r $LAMBDA_ROLE_ARN

This will deploy Chaos Llama to your AWS environment, but it won’t actually do anything by default.

Configure Chaos Llama

To configure termination rules, run deploy with a Llamafile:

AWS_PROFILE=llama llama deploy -c Llamafile.json

Llama Configuration

A Llamafile is a JSON file that configures your Chaos Llama:

{
  "interval": "60",
  "enableForASGs": [
  ],
  "disableForASGs": [
  ]
}

The options are:

interval (in minutes) - how frequently Chaos Llama should run. The minimum value is 5, and the default value is 60.
enableForASGs - opt-in the ASGs that instances may be selected from for termination. An empty list ([]) would effectively disable the llama.
disableForASGs - opt-out ASGs from being selected; instances in any other ASG are eligible for termination.

If both enableForASGs and disableForASGs are specified, then only enableForASGs rules are applied.

Further Plans

Adopting ZeroMQ’s C4 contribution model.
Integrating with Slack and other team chat tools
Support for email notifications via AWS SES
Janitor and Conformity Llamas

Would you like to contribute? The Issues is a good place to start for some ideas. Feel free to email me on h@veldstra.org if you have any questions.

Source code: https://github.com/hassy/llama-cli
Questions / suggestions / comments:
- tweet me @hveldstra
- drop me a line on h@veldstra.org
- or create an Issue on Github
Updates: follow me on Twitter: @hveldstra

P.S. Only 90s kids will understand

Nobody whips this llama’s ass.