We’ve recently made some big strides at Dyn in implementing a more modern configuration management platform (CFEngine 3) to replace an internally developed system that wasn’t meeting our needs anymore.
Reading about the options out there inevitably leads you to see comparisons of many tools to fill this need. I found myself also learning about several myths regarding implementing configuration management.
In our case, some of these were the difference between a successful deployment and an abandoned effort.
Myth 1: Servers are disposable. Rebuild them if they break.
Reading best practices of configuration management, I got the impression that this is used as a crutch for when your configuration adjustments break existing systems. We’d prefer not to have to fix broken machines by rebuilding them from blank installs and I don’t think we’re alone in that thinking.
I suspect most system administrators would rather have the existing machine continue to work than deploy a new one. With particular system architectures and the modern capabilities of virtual machines, servers may be more disposable than ever, but that doesn’t mean they have to be.
Some of our machines are a little too reliant on the hardware performance they get from dedicated storage, CPU time or dedicated network interface cards. Redeploying hardware machines through our distributed infrastructure from scratch isn’t a trivial problem to resolve. It’s also a problem that rarely comes up on its own.
The promise/action arrangement of CFEngine configurations works well for us in this regard. We can be sure that the promises we make with CFEngine should always come out true, as well as the atomic operations to correct them shouldn’t fail on any machine because of how atomic the promises are in the first place.
Myth 2: Configuration Management is an all or nothing endeavor.
The way we’ve arranged our promises in CFEngine has allowed us to overlay centrally controlled configurations on top of existing systems in production usage. We were able to build the promises/actions for managing users, then the promises/actions for managing groups, etc. We proceeded through each of these “features” that we needed one at a time.
This approach just means that as one feature in our old configuration management tool is re-engineered, we can remove it from the old tool and let CFEngine handle it from here on out.
A feature can be simple, like distribution of internally-recognized SSL certificate files or adding a note to the MOTD seen upon login. A feature can also be complex, like the deployment of one of our application installations, requiring installation of system packages, files deployed from our codebase, users and groups created, etc.
The important part is that as you build the features you want to manage into the new configuration management solution, you can gradually move from your existing configuration management tool to your new tool. Building your rules this way also ensures that you know they’ll work when you deploy them on machines in different stages of convergence.
Myth 3: Constant updating is necessary to keep all your systems in sync.
Out of the box, most modern configuration management systems will aim to constantly update your system to reach a state of eventual convergence with your configuration. CFEngine isn’t any different in this regard, but that’s a troubling scenario when you’re overlaying configuration management onto existing boxes.
When you ask about dry-run or reporting-only modes, you’ll likely find some friction with best practices for modern configuration management. You’re supposed to just trust the configuration management tool to accomplish what you’ve set it to do. You inevitably have to ask the question about what happens when you need to make a change to a system that is having trouble communicating with the central configuration authority for one reason or another.
We’ve found that with CFEngine, we can assuage our discomfort with this concept using its dry-run modes. We can make local changes to a system if the need arises, such as during a DDoS mitigation when every second in the response time counts and central configuration may not be feasible.
If we do, our configuration management systems will now start screaming for attention by alerting us to unkept promises, but it won’t undo our work to mitigate an attack automatically. Once the incident requiring those manual changes is over, we can worry about making sure systems are in sync with configuration management again.
Most importantly, we won’t forget to do that because the systems will make it clear that they need attention as soon as we can focus on them again.
This may not apply to other infrastructures, but in our case, it’s important to recognize that configuration of our product is adjusting itself all the time. System configurations outside of our application don’t actually change that much. This kind of reporting-only approach to configuration management works very well for our operations.
Myth 4: Pick a tool and just start using it.
This is great advice for getting started and getting familiar with configuration management, but lousy advice to get started really using it. The principle is that the strengths and weaknesses of each tool will balance out and everything will work out once you get comfortable with the tool you’ve chosen. The better message is that using any configuration management tool is better than using none. I’d be hard-pressed to disagree with that.
I prefer to think it’s best to pick the problem that is the largest pain point for your existing configuration methodology and then try to implement that in all the configuration management solutions you’re looking to take for a test drive.
Try it in each of the operating systems and versions you’re trying to manage. The aim here: be sure that things that work great for you on a Linux box don’t run into quirky side-effects on an OpenBSD box. Don’t focus on building a whole system; build a whole feature instead.
In our case, that first feature was user and group management where Dyn’s amount of staff has scaled much faster than our ability to manage it in our old home-grown configuration management solution.
Make sure you can manage that feature in each of the systems you’re considering migrating to or else you’re going to quickly see an abandoned project. Why would you spend so much effort building a new configuration management solution that will run into the same problems you’re already coping with now?
The best part of this approach: as you begin to deploy your new configuration management solution, it’s already solving pain points in the old system!
Myth 5: This isn’t scary. Just do it.
Actually, when you tell your new configuration management system to “Go” in your production environment for the first time, it is kind of scary. It doesn’t matter how long you’ve been testing, how many scenarios for failure you’ve planned for or how good your reverting strategy is.
Ignoring that bit, just do it anyway. It is worth it.