Things You Should Be Doing With Your Server, But Probably Aren't

mmt · on July 22, 2010

The part about log rotation strikes me as an anachronism, which, sadly, is still applicable due to naive default configurations.

Disk space is preposterously cheap, and even high-volume, Internet-scale text logs are small and low-bandwidth[1].

The problem isn't that disks fill up, but, rather, that logs are ever written somewhere that, if full, affects anything but the logging. The solution of a separate filesystem for logs has been around perhaps longer than the OP himself but only makes sense for a few, standalone servers.

Otherwise, the solution of remote syslog has been around for almost as long. If your logs are critical, you could even use two (for twice the price). Even the days of questionable reliability and small message size[2] limits are pretty long gone with syslog-ng.

This kind of thing, otherwise tedious minutae, is second nature to sysadmins. Hire one.

[1] It's the disk bandwidth or throughput which is still expensive

[2] That's on the order of 500 characters, old text pager lengths, not the newfangled 140 character sms/twitter cheapness. Hey, you kids get off my lawn!

astine · on July 22, 2010

It's not unfortunatly... I ran into this problem the other day. The log on a long running Postgres instance brought down an entire tape archive for half the day. I had to shut down and reconfigure the instance. Of course, part of the problem was that the it was on the root partition, but this doesn't change the fact that it was a 48 GB log file with most of its information dated.

theBobMcCormick · on July 23, 2010

Log rotation isn't just about disk space (although why you'd want to waste Gigs of disk space on storing old log messages locally is beyond me), it's also about helpful to segment logs for easier and quicker searching and reporting, either manually or with scripts.

Not to mention it's pretty much automatic these days. It's not like it's something you've got to go out of your way to configure.

mmt · on July 23, 2010

Log rotation isn't just about disk space

I disagree, since, otherwise, there's nothing rotary. If one has effectively infinite disk space, nothing ever gets "rotated out," for example.

Certainly, log segmentation provides benefits beyond being a prerequisite to rotation, but, as you point out, it's a common default configuration. An inbuilt feature of syslog-ng is log file naming based on date, obviating any post-processing.

hopeless · on July 22, 2010

This is a great reminder why I moved to Heroku ;-)

(the points re. backup and monitoring do still apply, but at least it's only for your app and not the whole server)

truebosko · on July 22, 2010

Does anyone have any thoughts on using config software like Puppet or Chef? I've been having to deploy a few more servers lately and this is exactly what I may need, curious to hear of experiences.

davnola · on July 22, 2010

I've recently started using chef for config management. It's Ruby, I'm Ruby so it was always going to win over Puppet there. Off the top off my head observations:

* Conceptually clean and very powerful

* Quick and easy to write new cookbooks from scratch, but the public repo of cookbooks is more useful for code examples than day-to-day use

* Documentation (the wiki) is patchy but improving

* The webui could do with some love (buggy, plus some useful tasks like executing a chef run are missing) and the CLI, knife, is a bit odd and bloated with stuff that belongs elsewhere

* The chef server stack (Merb, Solr, Couchdb) needs babysitting

* Bootstrapping is still a pain (ain't it always?)

* There are some inconsistencies and a bit of churn in the API

The Ruby DSL has some interesting quirks. Recipes are compiled in one phase and executed in a later phase. So just to make a simple log entry:

  ruby_block do
    block do
      Chef::Log.info("File created!")
    end
  end

Eww.

But overall it's an effective tool for an essential task.

carbon8 · on July 22, 2010

I use chef, but I have no experience with puppet. Overall I'm happy with chef, although there are rough edges. At this point, I can't imagine not using something like chef or puppet.

I found chef to have a bit of a learning curve, especially when running your own chef server. I actually gave up using the server after a while and manage my servers (between 5 and 10, depending on what I'm working on, plus a bunch of one/two-off servers for client work) using chef-solo. Now that they offer a hosted server service, I might start using that, but I'm pretty content with chef-solo, roles and a minimal json config file on each server.

Chef has been a bit of a moving target during its development, so it's been hard to keep up with. This goes both for chef itself (although, presumably the API will be more stable after it hits 1.0) as well as the cookbooks. I kind of dread pulling in the changes from main cookbook repo because it's tough to know whether some change to a cookbook will negatively affect your config.

It doesn't necessarily save time up front; it takes more time to write a bug-free cookbook than do an apt-get install and edit some configs. But over the long run it makes it much easier to understand your server config. Rather than having long-forgotten edits to hidden config files, you have a solid record of exactly what your server config is.

WALoeIII · on July 22, 2010

Do it, and most importantly: don't "cheat." Make sure your machine is 100% described by your tool. The closer to 100% you are without being at 100% the more likely you are to forget that single tweak.

Chef vs Puppet is like Emacs vs Vi. I use Puppet because it was the only game in town when I started, but I know a bunch of the Chef guys (hi Adam!) and they know whats up. The tools have different approaches but are rapidly converging on the same feature set, pick a simple role (say a webserver) and do the entire config in both tools and you'll know almost instantly which one matches your mindset better. I like to think Chef is a bit closer to the shell, things happen in order and you can rely on that, whereas Puppet is a bit more abstract and doesn't require you to micro-manage it as much. The golden truth is somewhere in the middle of this approach, Puppet for example just added some features to make ordering more consistent since people have really struggled with that in the past.

Chef is pure Ruby, so if you know Ruby you'll have a nice head-start. Puppet has the Puppet DSL, which if you are a sys-admin jives well with config style syntaxes. Puppet just released a pure Ruby DSL as well but it doesn't have 100% feature parity with the Puppet Language.

Provisioning and Bootstrapping machines is still a black art. I use a ruby script that sets some environment variables then launches a first run of Puppet, after that Puppet takes over and manages. There is no silver bullet (yet) for getting a machine into your Puppet/Chef cluster, though there are a lot of innovative approaches that look very promising. Depending on your environment (EC2 vs say Slicehost) I'm sure you can find a blog post that will get you going in the right direction. Here is mine: http://blog.onehub.com/posts/coordinating-the-onehub-cloud . This is already a bit out of date, we're migrating from iClassify to the officially blessed Puppet Dashboard (http://www.puppetlabs.com/puppet/related-projects/dashboard/) we just need to put some more time in to finish.

Testing sucks. You will need to come up with some repeatable environment you can run your configuration against. I use VMWare and just continuously restore to a snapshot. There has been some good development here, especially with Vagrant (http://vagrantup.com/). This lets you write a shell script (or rake task in my case) that boots an instance, runs the config and then pauses for you to inspect. From here you can push another button and 'retry' the whole thing or destroy the VM and start over. It is annoying to get this going and you'll be tempted to just start on one of your machines but that is a bad idea. This tool is going to run as root, and after the first day it is going to run with minimal supervision. You need to be sure its doing what you expect.

mmt · on July 22, 2010

Provisioning and Bootstrapping machines is still a black art.

At the risk of appearing to build a strawman, I'm purposefully taking this quote slightly out of context, because this strikes me as a far more general belief, which is used to justify these CM systems as a solution. Herein lies the danger of circular reasoning or self-imposed problem.

Since I'm an open-minded sysadmin, I always keep an eye on the likes of puppet[1], but I have continued to reject them. Much of it has to do with philosophy: use what I can, that's tried and true. Very nearly all the right pieces for provisioning and configuration management already exist[2].

Make sure your machine is 100% described by your tool.

This, sadly, smacks of perfectionism, which is known as the enemy of Good Enough, yet I agree that these tools demand it.

The vast majority of this kind of description is already done by the OS via the package system and init/upstart[3]. To duplicate this kind of description with a separate tool is, to me, incomprehensible.

What's more, for at least the past 5 years, brand name server hardware[4] has had, in the BMC, without any special add-on cards or option keys, enough IPMI-over-LAN support that one can, over the same physical ethernet as one of the regular network interfaces, set the next boot to be PXE and trigger a reset. From that point, a fully functioning server can be up in 5-10 minutes[5].

With those kinds of provisioning times, why would I want to bother with something that requires the extra step of "black art" bootstrapping[6]? At most extreme, to make a configuration change on a running system, I'd just need to trigger the installation of a new package version on the relevant systems.

The best part of such a scheme is that I don't need to make any further customization choices, like puppet vs. chef. All the infrastructure I need (DHCP, DNS, TFTP, kickstart or debian-installer, local mirror/repo) is a Good Idea to have anyway, and it's all standard. I would expect any moderately experienced sysadmin to be able to debug all those pieces, without learning a DSL or a particular system's quirks. One also benefits from years of evolution of such tools, including "free" redundancy and pre-existing plugins for monitoring.

The only thing that's left is some kind of higher-level templating, which can be added as a wrapper around all of the standard things. So far, the only tool I've found that doesn't want to take over everything all at once, and works fine with incremental takeover/integration of the underlying tools) is Cobbler [7].

Not all problems can be solved with (custom) software.

[1] It was this month's BayLISA topic.

[2] Growing up with parents in the semiconductor industry, my exposures to Unix (and VMS, TOPS, and VM/CMS, none of which "stuck") and Lego were around the same time, so there's a deep-seated analogy there.

[3] which are, of course, configured by files which can be contained in packages, so, really, just the package system.

[4] that is, any rackmount server which can be ordered with a cable management arm. That there is such a differntiating factor belies the notion of "commodity" hardware. I find it to be merely a euphemism for "lowest common denominator" hardware.

[5] I've observed this scale easily, with no slowdown, to 30 clients against one sub-$1k boot/repo server.

[6] "Because we use cloud providers" is a weak answer, since, besides being a self-imposed problem with other unique issues, it gets remarkably expensive beyond a few dozen (if that) instances.

[7] When I last dove into it a couple years ago, it was clearly focused on kickstart and the Redhat/Fedora world, with Debian/Ubuntu barely an afterthought.

Goladus · on July 23, 2010

With those kinds of provisioning times, why would I want to bother with something that requires the extra step of "black art" bootstrapping[6]? At most extreme, to make a configuration change on a running system, I'd just need to trigger the installation of a new package version on the relevant systems.

In my experience, "making a configuration change on a running system" is not an extreme case. It happens all the time, and 5-10 minutes of downtime for reboot just to add a comment to an apache configuration file is insanity. Especially if there's a problem and you need to rollback.

Frankly, if you are booting machines in 5-10 minutes with all the packages they need, you've almost entirely solved the bootstrapping problem anyway. There are just a few security bits left.

edit: I did a quick check of our subversion repository, it looks like our commits per month are in the 80-100 range. Systems get reconfigured, at least in minor ways, a LOT. Installing a new version of a package for every single change would be a huge amount of overhead. Far more than the one-time overhead of bootstrapping cfengine. In fact we could do it that way, if we wanted, but we don't.

mmt · on July 23, 2010

* It happens all the time, and 5-10 minutes of downtime for reboot just to add a comment to an apache configuration file is insanity.*

Agreed, since that comment doesn't warrant any deployment at all, but that's a strawman.

Especially if there's a problem and you need to rollback

This is, of course, a philosophical difference. How can one be sure that the rollback actually results in the previous state? For me, the certainty outweighs the speed increase.

Installing a new version of a package for every single change would be a huge amount of overhead

My reaction to this is that you must be doing something vastly different to install a package. Unless you're talking about an environment with just a handful of servers (in which case, why even bother with CM?), the overhead of building a package of text files would be less than that of checking them out from version control.

IgorPartola · on July 22, 2010

A recent issue with my server caused me to look into this. I ended up writing a shell script to deploy the server: install packages, add users, set up services (copy config files). The beautiful thing is that I can run the script repeatedly without side effects, so now instead of using apt-get I just add the new package to the list and run the installer.

theBobMcCormick · on July 23, 2010

Config management tools are awesome, if you've got enough servers to warrant the learning curve, the "chicken and egg" problem with getting the config management tool installed in the first place, AND if you can get your entire sysadmin team on board with consistently using the tool and only the tool for any changes.

If you've only got a few servers, or if you've got a very heterogeneous environment.. then they might not be worth it. If you've got a lot of servers, config management tools may save your life.

The tools do keep improving though, and Puppet and Chef are miles ahead of earlier tools and even every commercial tool I've yet seen. But this is still an area with a lot of opportunity for innovation.

trevorturk · on July 22, 2010

I've been happily using Sprinkle for a while now:

http://github.com/crafterm/sprinkle

Might be worth checking out if your needs aren't too complex and you want to get up and running quickly with something simple.

MartinCron · on July 22, 2010

I have experience with not using config software for what should have been a set of identically-configured production servers. It's a particularly frustrating kind of hell to which I would never intentionally return.

Herald_MJ · on July 22, 2010

Backups are all well and good, but really they aren't going to save you unless you are also doing frequent test restores. Only when you've acted out your worst-case scenario (with a set of virtual machines and a recent set of backups) and managed to restore the system without problem do you really know that your backups are adequate.

steveklabnik · on July 22, 2010

... which is why the article mentions them directly after backups.

Herald_MJ · on July 29, 2010

... no, the article mentions checking the backups are being made, that the processes you put in place are actually working. This says nothing about how useful and effective your backups actually are in a worst-case scenario. For this, you need to be doing test restores, which aren't mentioned in the article.

lazyant · on July 22, 2010

Alternative server security checklist: http://watsec.com/article/50

snitko · on July 22, 2010

Nice checklist. I will use it sometime.