> Do you just kill and relaunch periodically? I can see that being stochasti...

mnutt · on Feb 16, 2013

I still don't quite understand how you manage version control with entire system images. I guess you could just write a changelog, but that requires a large amount of self-discipline to ensure that the changelog exactly matches system state. Let's say you discover some minor instability, and discover that it was introduced 8 months ago, but the changelog for that image was completely innocuous. Can you do anything other than throw out all 8 months' worth of images and try to work your way back to present via the changelogs?

Puppet-like systems have the same issue in that they don't attempt to specify the entire system, but at least with puppet if there is an issue due to system drift you can start with a clean base installation and re-run it.

contingencies · on Feb 17, 2013

I still don't quite understand how you manage version control with entire system images.

You just uniquely name the environment, for example with a version number, and/or use a snapshot-capable datastore.

I guess you could just write a changelog, but that requires a large amount of self-discipline to ensure that the changelog exactly matches system state.

Definitely don't do this.

Let's say you discover some minor instability, and discover that it was introduced 8 months ago, but the changelog for that image was completely innocuous. Can you do anything other than throw out all 8 months' worth of images and try to work your way back to present via the changelogs?

If you write a test that can trigger the issue and replay the test against past images ("regression test") then you will identify exactly where the issue was introduced. The key thing is: don't have manually configured what-not in production. Keep it versioned, keep it solid, keep it known.

Puppet-like systems have the same issue...

Exactly. This is my point. They don't really deliver on the promise of decent automation, because they are inherently patchwork/partial-scope in approach.

jacques_chester · on Feb 16, 2013

Thanks for your reply.

I still feel you're ultimately defining a DAG ("I want 20 web servers, 3 database servers and 2 load balancers") and relying on some compare-and-repair (health checks and replacement).

But you've moved your unit of management from packages etc to machines. I've previously argued that this is the key thing that will change web hosting economics. I was sorta wrong, but I can see now where you were coming from.

contingencies · on Feb 16, 2013

Snoop-doggy move over; Nerd-DAGgy defines the season!

In an interview with MTV on his new formal collection, taglined 'S-eXpression', Nerd-DAGgy was asked what the secret of the hot new look was. In characteristic brevity, he replied "It's declarative."

-- This Season's Assembly, Fashion World, February 16, 2013.