f00bar.com

dev += ops

Thoughts on the State of the Distro and Config Management

| Comments

With the proliferation of Virtualization/Cloud infrastructure our traditional Ideas and practices when it comes to “The System” are not keeping pace. With every IOP, Cycle, and byte we consume in our workloads we pay. It used to be that a system was idle in one of its dimensions a fair amount of time. This idle nature allowed us to build “Fat” management tools to ensure things worked.

What follows is a rant that has been percolating in me for a couple years now. I have attempted to try to organize the rant in such a way to be comprehensible. Mind you this post started in November of 2012, and I have been slowly adding to it.

Where I rant about things

Waste

All the management tools seem to be wasting heaps of resource. Especially the most popular (and my favorite) chef/puppet. Things like Ansible and Cfengine3 have a much more desirable overhead for config management. Other monitoring tools in common use are crazy overhead. Most nagios plugins are not resource minded. Things like collectd are.

In the same vein Most of what is a “distro” is waste in the modern datacenter.

I/O

Most of the tools are doing some file caching and state caching but still tend to tear the shit outta file i/o when ran. On some boxes with chef (puppet before that) with heaps of files to stat the runs go nuts with tiny I/O. On stuff like ec2 this is money. Instead Why can’t we plug these things into the OS’s file event notifications. see my hacked chef-inotify stuff for a non functional experiment.

Disk

Most distros put heaps of shit on disk that we don’t need in the modern data center. Most documentation on every VM you build is pretty useless. Man pages are great, I love them and use them almost every day, but i don’t need them on every one of a thousand vm’s eating up 100’s of MB (oh woop a couple gigs).. Yea but thats a couple gigs we pay for!

Generally it doesn’t stop with docs. The mentality today is disk is cheap. In many ways it is, but there are other costs to disk. Especially when it comes to the IaaS images. Time to provision is something most people care about.

CPU

All the monitoring and management things tend to have a heap of crazy overhead. Mainly cause individually its not an issue, but collectively they add up to be a nuisance.

I dunno how to solve this really. It’s a necessary design trade off between being able to automate/monitor easily with whatever tools you may partially know, and thrash your system :D

Memory

Again in our IaaS infra. As a consumer and a provider these resources matter. As a provider that capacity I can’t sell. As a consumer thats money out of my pocket. Yet it is treated as tho its no thing.

  • Chef/Puppet trade off heaps of memory for ease of use
  • 90% of the time the daemon isn’t doing anything except chewing on memory

Granted I run chef on a cron schedule so it only eats what it needs for a run, and the fork mode solves some of the leaking issues it used to have. Fundamentally tho these issues are side-effect of using ruby or any other high level lang that wants to treat memory like shit.

I got no constructive Ideas here other than write your shit in a systems language when its meant to be a systems service. (not really constructive more curmudgeon, but yea)

Supervising the Supervisors

So yea, Chef and puppet etc.. like to watch services for state. Most distros today implement service supervision in one way or the other (upstart/systemd/launchd). Yet our frameworks aren’t plugging into this system for service management in a direct way. Instead they are executing shell scripts to interrogate state. Never mind the fact that they are polling said state when some of those supervisors support subscriptions for notifications.

Init scripts are imperative. Supervisors and CM want to be declarative systems. I think some issues arise because of the conflict in these approaches.

  • Our service infra should not poll.
  • Our Config Management should subscribe to services.
  • Init scripts should die

Maybe the 2 systems really could be merged in some way they are both doing so much of the same things.

Notifications > Poll

srsly

Package Managers are not Config management

This one was hot on the chef list recently. A package manger is something we did before we had awesome ways to manage all our configuration state. They are also a way to deliver a binary with its configuration setup in a way that should mean the binary will run. Unfortunately they have their own opinions on this, and frequently that clashes with the goal of the Config management framework.

  • Packages are not CM
  • CM can be a packager tho.

The network is the computer

Sun was right. Look at where we are today with Infra and Platform services. I want my systems to be aware of one another in a deep way. Not by some overarching orchestration framework. I don’t need a distro for my laptop I need one for my datacenter that isn’t a pile of shit.

I want my distro to only have what it needs to do the thing its doing, and nothing more. When those needs change it should be able to get to that state without issue. Possibly talking to it’s neighbors.

  • Would be nice to see promise theory baked in
  • Torrent as a basic service

Dep Hell

Everyone who has used any packager or any programing language knows this. Cfg Management can help in some ways by building abstractions over the dependencies, and obfuscating the pain to a point. Even the management tools in this scenario fall victim to dephell. A good example from chef land. I need to pull an apt cookbook to manage a yum based system.. Now I understand why, and the root cause within chef and the metadata that leads to this. I also understand that this is not really an issue when it comes to how the recipes are evaluated, but it still makes me upset/uneasy.

Source based distros get around this in a heavy handed way, but it is my experience that they tend to have less issues than binary distros. Tho this is anecdotal. As is this entire rant I am calling a blog post.

  • there has to be a better way

Too Much central Bullshit

Everyone likes to build central management of things. Central monitoring. Central aggregation of logs. This just doesn’t play in the scale game very well. It lends to management and scaling pains.

  • there has to be a better way

Everything is shoe-string

Scripts on Scripts on Scripts all glued together with python/ruby does not a system make. :\

  • there has to be a better way

My Ideas

Aside from the ideas in the rant, I have these other rough ideas rolling around in my noggin, and I feel its good to get them out. Again mostly ranting, and notes to self.

Hackers are Smart

Maybe they can teach us stuff

When you look at many of the botnets out there. They have some really admirable qualities. I mean heres complex networks of machines working together with pretty minimal C&C, and across very hostile and turbulent environments. Why aren’t we using more of this shit in Datacenters for good, not evil! Look at these qualities:

  • Distributed Without control nodes
  • C&C is propagated
  • Neighbor discovery Confidence possible “infection” to other machines
  • Fallback to Non Distrib C&C in fail modes (using twitter/irc etc to send C&C to your infra. err Botnet)

One Config FrameWork to Rule them All

  • Systems languages are not hip, but they exist for a reason.
  • LFS/DSL based distro
  • Merge CM and Init
  • Leverage services Notification system ** s6 or Systemd (both have service bus arch)
  • Notifications on state change
  • Registration of new services
  • Init -> Calls CM for service actions
  • Canonical file changes should notify the CM.
  • CM doesn’t even stat files unless FS event has been triggered

This makes our basic OS so much smarter than stupid init scrips

Merge the Packager and The CM ?

Everyone is CI Pimpin’ why cant we just build the whole fucking OS. (especially if its tiny) and described in CM.

Chef already has the basic resources (ark) to be a packager, and produce binaries that could be disted out. Whole thing could sit on github.

In this scenario I imagine chef (or something like it) describing exactly what a system should be. I.e. packages and package deps. The entire enchilada, but this enchilada is pretty small to be fair. The entire distro is then assembled from a stage2 bootstrap up to “base”. This process is built publicly on Travis or something akin to this. No releases. Just the CI distro.

Brew Rocks

Can’t we just do that for everything. Or kinda. I mean just make it super simple and easy to write recipes. Something like a brew + chef or some wacky way to define a packages build flags and deps, but integrated in the config management that sits at the center of this new Un-Distro.

Why?

  • Isolation of Versions (kegs)
  • Multiple lib versions
  • Just Fuckin Works (most the time)

Weaknesses

  • Rollback
  • No Partial upgrades (but srsly just redeploy)

Ok Ok binaries have a place.

lets manage those independent of configs then

CI + Metarepo? + CM

Comments