🛠 Netflix Security Monkey as Chaos Engineering
Salute,
The holiday festivities are over, now it’s time for the waste disposal facilities to get back to work. But for you, I’ll keep the New Year’s logo for now to make it warmer.
And we’ll start the year with Chaos Monkey, remembering the past:
An open source application (detailed documentation in addition to the post), developed by Netflix Security, which deliberately implements denial of service for instances in a production environment. Has high demands on the maturity of processes within the company and the competencies of tool administrators. For correct testing, the tool itself and loaded applications must be administered in Spinnaker - a CI/CD platform used by official developers. License type: Apache License 2.0.
Main features
- Simulation of infrastructure service failure
- Identifying bottlenecks in architecture
- Checking system nodes for excessive duplication
- Monitoring crashes of each server and detailing information on how this affects the system
- Forced controls and implementation of health-check, including A/C timemaps
- Simulation of load testing and stress tests, including load testing of port types 429, etc.
How the tool works
- After a denial of service has occurred, the tool configuration performs an inventory of instances
- Next, the instances available for testing are selected (defined using policy / configuration files) and collects a list of instances with which interactions are allowed
- Next, using the system utility, it is called during the working day according to the created schedule. Doesn't run as a service. Instead, set up a cron job to schedule shutdowns
- When a schedule is created, it creates another cron job to schedule completions, including random scheduling
How to touch?
$ go get github.com/netflix/chaosmonkey/cmd/chaosmonkey
# Put Halyard
$ curl -O https://raw.githubusercontent.com/spinnaker/halyard/master/install/debian/InstallHalyard.sh
$ sudo bash InstallHalyard.sh
$ hal config features edit --chaos true
$ hal config version edit --version $VERSION
# Checking connection with Spinnaker
$ chaosmonkey config spinnaker
# With Spinnaker installed
$ sudo nano /opt/deck/html/settings.js
# Set the special parameter to True
$var chaosEnabled = true
# Manually stopping instances with specified parameters
$ chaosmonkey terminate [--stack=] [--cluster=] [--leashed]
Total:
- Tool, only supports Spinnaker compatible - AWS, Google Compute Engine, Azure, Kubernetes, Cloud Foundry
- The instrument is quite old. The solution was released back in 2011 and is capable of shutting down one instance at a time. Given the current scale of infrastructure, there is a need to conduct chaos scanning on a much larger scale. So, a logical continuation of the tool - Chaos Kong can conduct fault tolerance tests on a much larger scale, for example, cutting off one of the AWS regions. Let's look at Chaos Kong itself further
#appsec #toolchain #reco #techsolution #paper #specialty
