Would be interested to get your opinion on Puppet/Ansible/Chef/CFEngine/SaltStac...

user5994461 · on May 16, 2018

CFEngine is basic text manipulation, it's not comparable to the rest.

Puppet and Chef was the first generation. I wouldn't recommend. All the companies and people I know using Chef migrated away from it after many disasters. Nowadays, it's only mentioned in interviews to find out if candidates have real world fire fighting experiences.

Ansible is good. Used that for managing hundreds of machines at multiple jobs (some who migrated from Chef). It's been bought by RedHat, it's well maintained and I think it will have the brightest long term future.

Not sure about SaltStack. Never had the opportunity to try. I'd be a bit worried though on the long term prospect because I don't think they have much backing or user base.

kazen44 · on May 16, 2018

> Not sure about SaltStack. Never had the opportunity to try. I'd be a bit worried though on the long term prospect because I don't think they have much backing or user base.

saltstack is a well thoughout solution in my opinion. It makes more logical sense and is less of a mumbled mess then either chef or puppet and has miles better performance then ansible.

I know quite a few shops who use it. Its definitly smaller then ansible.

mattbillenstein · on May 16, 2018

+1 for salt -- I wish it had better docs or examples of how to build out a larger system; it's hard to start with imho even if you know ansible well. The existing docs read like man pages without the helpful examples even.

At the last gig, I wrapped salt deploys with a small Slack bot, so users would fire deploys from Slack; you could see what was going out and who was pushing. It was a very very nice, simple, fast solution that should scale to hundreds of machines easily.

robohoe · on May 16, 2018

SaltStack is around. Lots of big orgs take the time to understand. Ansible is more popular because you can use it with just one playbook. Saltstack requires you to think about your environment and design your configuration management properly.

dijit · on May 16, 2018

I use salt. Multiple thousands of machines. I feel like I've barely scratched the surface of what it can do with it. I wrote some custom utilities for it. Added some functionality to handle physical deployments of an OS with redfish (the new iLO/iDRAC api).

Salt is not without warts but its definitely worth checking out.

kokey · on May 16, 2018

CFengine, at least version 3, was probably the furthest away from string manipulation (and I was given the impression that text file content manipulation was considered a bad idea with it). What killed it was promise theory, which is actually a great theory and works quite well but made writing the bundles painfully hard and also hard to maintain. Also during the early days of v3 it was probably lacking a ton of essential functions so even if you were trying to do things the right way you would bump into feature limitations. I think this put a lot of people off adopting it widely and why Chef and Puppet did so well.

Puppet and Chef is actually quite good and I still prefer it to Ansible for a number of reasons. I've certainly run it fine in environments of many thousands of servers, though I can understand that it can implode for some people at scale if they design their deployments in a certain way or structure their manifests/cookbooks a certain way. That said I've certainly seen Ansible fold on much smaller infrastructure, but that is also down to a number of factors that can be avoided or mitigated. Idempotency with Puppet is really strong which is something you want if not every single system in your environment is ephemeral, with Chef it's almost as good with that but not always with the first run, with Ansible you have to specifically consider and aim for it when writing code for it in your playbooks.

The fact that you get used to having Chef or Puppet run e.g. every half an hour is a good thing, where Ansible runs are more ad-hoc. This leads me to another thing that bothers me and that is where people think it's a situation of having to use e.g. Puppet or Ansible as if they're conflicting choices for the same tasks. They have a lot of things in common, but Puppet is more for managing and ensuring changes in an idempotent, non-conflicting way while Ansible is more for doing something a bit like that but more for ad-hoc or orchestration tasks. I think it's good to use both but also be sure what you use it for, since one can do a bit of what the other thing is good at but doesn't do it so well.

For example, I would consider using Ansible to do deployments and releases, rotate SSH keys, execute failovers, or even to install the Puppet agent for the first time. I would use Puppet to deploy and update monitoring agents and configuration, user access, ensure directory permissions, configure system things like rsyslog, logrotate, Postfix, ntp, etc.

dozzie · on May 16, 2018

> This leads me to another thing that bothers me and that is where people think it's a situation of having to use e.g. Puppet or Ansible as if they're conflicting choices for the same tasks.

That's mainly because Ansible folks advertise it as a configuration management tool, while in fact it's a deployment tool. The former needs asynchronous operations, especially because a node that is supposed to be reconfigured can be temporarily down. The latter needs to be executed synchronously, with reports being read as they come by an operator.

There are several other operation modes that are useful for a sysadmin, like running a predefined procedure with parameters supplied from the client, or running a one-off command everywhere (even on the servers that are currently down, as soon as they are up), but we don't have many tools to cover those cases.

atsaloli · on May 16, 2018

I make my living as a CFEngine consultant. CFEngine runs every 5 minutes (it's lightweight enough to do that). The evolution was: CFEngine 1 ran once a day; CFEngine 2 ran once an hour; CFEngine 3 runs every 5 minutes. Self-healing infrastructure.

emmelaich · on May 17, 2018

The concept of self-healing is a bit weird for me. Surely you want to investigate the cause before it heals?

Funny that we have tools like tripwire which have the opposite idea of the world.

My dream would be to have both functionalities in a single tool.

Bidirectionality! If you solve a problem on one machine you could pull that fix then push the same fix out to other machines as a preventative measure.

Some mix of git/osquery/augeas could do this.

Karrot_Kream · on May 16, 2018

> Ansible is great. Used that for management hundreds of machines at multiple jobs. It's been bought by RedHat, it's well maintained and I think it will have the brightest long term future.

A lot of folks I know have been bitten by Ansible's performance (Ansible has a central master that runs recipes on each node, rather than having nodes "pull" from a central master).

brightball · on May 16, 2018

Ansible has a pull mode that can be turned on. There are some trade-offs with it from the normal operating model, but it's there when you get large enough to need it.

https://docs.ansible.com/ansible/2.4/ansible-pull.html

Florin_Andrei · on May 16, 2018

Ansible has a very, very low barrier to entry. You go from 0 to 100 in a very short time. It makes a lot of sense to use it when you just begin building your infrastructure.

Later on you can run Ansible Tower, deploy Ansible agents everywhere, and basically use Ansible under the same client/server model like all the other tools.

Salt is eerily similar to Ansible, it's just geared towards client/server. Being experienced with Ansible, it was weird at first to use Salt because everything looked familiar, yet slightly different.

ofrzeta · on May 17, 2018

You can also get Tower's functionality with the free version called AWX, although not as polished. https://github.com/ansible/awx

user5994461 · on May 16, 2018

I believe you are talking about Ansible tower, the paid tool from RedHat that gives a centralized server.

Ansible is not centralized. It configures servers with SSH and can operate from any user or host who has ssh access.

wmf · on May 16, 2018

Yeah, any user or host singular. Execution of a playbook is driven by one machine which can be a bottleneck.

voltagex_ · on May 16, 2018

Yeah but when you run a playbook it's running from a single machine which is calling out via SSH

emmelaich · on May 17, 2018

Not necessarily from a single machine. It's pretty easy to divide your network and control the divisions from git clones f your Ansible files.

Ultimately you could have a git clone for every machine and only ever run it against localhost.

user5994461 · on May 16, 2018

Yes. The host will run 100% CPU to handle the hundreds of SSH connections.

I've been re configuring 300 to 800 hosts many times a day, never had a problem. I think it would take a few thousands hosts for the performance to be noticeably slow and I am really not sure that other tools or systems could take it much better.

virgilp · on May 16, 2018

I know our SREs once screwed the config for sshd, and considered themselves very lucky that they had puppet on the machines and could push a fixed configuration (if they had used exclusively ansible, that'd be the end of it - no way to connect or to deploy new configuration)

[edit] To clarify - ansible is great, and we use it. Just saying that, as everything, it still has (sometimes subtle) downsides in various scenarios. If it works well for you - great, but maybe others really were bitten by it.

e12e · on May 17, 2018

There's nothing stopping you from having a sshd instance dedicated for use just by ansible, on a different port/different network, on every node. Now if that's simpler or more complex I don't know.

But "have two ways in" is a basic principle of sys admin (typically via traditional network and some out of band console access).

twic · on May 17, 2018

When i worked with physical machines, they had embedded management systems, which were on a physically separate network to the machines' main interfaces, ran a little embedded SSH server, and would (amongst other things) give you a console on the machine.

Simpler machines should still have serial consoles, and you can get those on the network via a terminal concentrator or a serial-to-ethernet adaptor.

I would love it if Ansible could control machines over an interface like that, rather than via SSH. Then you wouldn't even need to run SSH on machines which don't need it, which is most of them.

user5994461 · on May 16, 2018

Well, teach your sysadmin to use the system configuration tester when they edit a system configuration file.

Nothing to do with ansible really, except that ansible allows to prevent that easily.

dozzie · on May 16, 2018

> Well, teach your sysadmin to use the system configuration tester when they edit a system configuration file.

Wrong. Teach your sysadmin not to overload a single service with different functions (debugging channel, user-facing shell service, running remote commands, file upload, and config distribution channel), especially not the one that should not be used in batch mode, without human supervision.

When you write an application, you don't put a HTTP server in database connection handling code, but when it comes to server management, suddenly the very same approach is deemed brilliant, because you don't run an agent (which is false, because you do, it's just not a dedicated agent).

voltagex_ · on May 17, 2018

Are you advocating running multiple sshd instances in this case?

dozzie · on May 17, 2018

Good heavens, no! You'd only have two different instances of the same service that is difficult to work correctly with.

For serving as a debugging channel and user-facing shell access, SSH is fine (though I've never seen it managed properly in the presence of nodes being installed and reinstalled all the time). But for everything else (unattended):

* you don't want commands execution, port forwarding, or VPN in your file server

* you don't want remote shell in your daemon that runs parametrized procedures -- but you do want it not to break on quoting the arguments and call results (try passing shell wildcards through SSH)

* you don't want port forwarding and remote shell in config distribution channel; in fact, you want config distribution channel itself to be reconfigured as little as possible, so it should be a totally separate thing that has no other purpose whatsoever

* you don't want to maintain a human-user-like account ($HOME, shell, etc.) for any of the above, since they likely will never see a proper account on the server side; you want each of the services to have a dedicated UID in /etc/passwd, own configuration in /etc/$service, own data directory, and that's it

Each of the tasks above has a daemon that is much better at them than SSH. The only redeeming quality of SSH is that it's there already, but it becomes irrelevant when the server's expected life time gets longer than a few days.

virgilp · on May 17, 2018

Yes, because everybody knows that testing eliminates all bugs.

(it's not that testing is useless - far from it; but I thought the HN crowd knows better than to respond to issues with "that's because you didn't do enough testing!")

random_throw · on May 17, 2018

I'd venture to say you're wrong about Salt. It's being used at some large enterprises. I use it (in one of the large tech companies) on thousands of servers, with plans to up that an order of magnitude or more. Of all of the solutions mentioned, it has been the most powerful, while also being the most scalable.

Other than that, my experiences line up with yours almost exactly.

tapoxi · on May 16, 2018

I love SaltStack, its more of a python framework for managing systems over ZeroMQ than it is pure configuration management. Compared to Ansible it's more complex but faster, reactive, and significantly more flexible. I'd highly recommend it over Ansible for larger environments. For smaller ones, it depends on if the steeper learning curve is worth it.

geggam · on May 17, 2018

Ansible starts getting painful around 1500 nodes.

dba7dba · on May 16, 2018

+1 for Ansible from me.

Of all the tools, I first heard of Puppet first and so I'm assuming it was first on scene? From my limited experience, it seems Puppet is most widely used tool because of that reason. Not necessarily the best of the bunch, but first on the scene. Considering the effort required to roll it out, I am assuming whatever is deployed first will stay as the tool of choice.

I've tried out Puppet, SaltStack, and Ansible, in that order.

What I didn't like about Puppet is that once you deploy a change, the actual change can happen on the "client servers" at any point within next 20 minutes. I may be off on the exact duration but I remember that changes were deployed at any point within that range of time. To me that sounds like not a great idea. What if you want to switch over your web servers at a specific moment? And Puppet requires a dedicated command/control server.

Next I tried SaltStack. I liked it enough. Now that I think about it and hear someone else mention it, yah SaltStack is similar to Ansible. What drove me away from SaltStack was that you essentially need a dedicated command/control server from where all SaltStack commands are sent out to SaltStack "client servers". I did not want to dedicate resource (and money) for a server that is rarely used. And the personal web/lab servers I manage can grow small/large from 2 servers to 10 servers.

Next I tried Ansible. I think Ansible is the perfect choice for me. I only needed to 'devop' just a handful servers and also learn a tool that many businesses seemed to want on resume. So I picked Ansible and it's been great. Some operations are not as flexible as doing it with a shell script (and I assume same issue exists for other tools). But I've had good luck combining Ansible with little bits of shell script to get the result I need.

The best part of Ansible is that any Mac or Linux machine can be used as the "command server", provided that you have the SSH key pair on your Mac or Linux machine.

Lastly, some may not like the ad-hoc way of doing things on Ansible, but I prefer it that way.

gaius · on May 16, 2018

I first heard of Puppet first and so I'm assuming it was first on scene?

CFEngine was first, it's based on a kind of maths called "promise theory" and it solved the problem of you had many different kinds of Unix owned by many different groups and had to have a consistent way of saying "all machines belonging to group X need to have user Y and package Z" and it would abstract away the slightly differing syntax between Solaris, SunOS, IRIX, OSF/1, Ultrix, yadda yadda. This is a problem that doesn't really exist anymore.

Chef I think came next, it was written by people who knew Ruby but didn't know maths so they used CFEngine terminology like "converging" but Chef doesn't really do that, it just runs Ruby scripts. If CFEngine was a scalpel, Chef is a mallet. Chef and Puppet are related somehow, same group of devs had a falling out and went their own ways, something like that. They are much of a muchness.

Ansible is cool because it recognises the reality of why CFEngine isn't relevant nowadays: most organisations are running just one particular Linux distro so you can do away with the abstraction and get all the benefits without the complexity.

dozzie · on May 16, 2018

> it's based on a kind of maths called "promise theory"

Promise theory is not math, despite its name. It doesn't predict anything, it doesn't explain any phenomena. It's an architectural approach. Brilliant, led to a really great software (CFEngine), but it's not "maths".

atsaloli · on May 16, 2018

Basic concepts of promise theory (10 minute video by Mark Burgess who came up with it): https://www.youtube.com/watch?v=2TPsB5WuZgk

2014 introductory article: https://www.linuxjournal.com/content/promise-theory—what-it

Basic book on the subject: https://www.amazon.com/Thinking-Promises-Designing-Systems-C...

It's not "maths" like arithmentic but it's "maths" like graph theory:

Promise Theory, in the context of information science, is a model of voluntary cooperation between individual, autonomous actors or agents who publish their intentions to one another in the form of promises. It is a form of labelled graph theory, describing discrete networks of agents joined by the unilateral promises they make.

https://en.wikipedia.org/wiki/Promise_theory

dozzie · on May 17, 2018

> It's not "maths" like arithmentic but it's "maths" like graph theory

It's less like graph theory and more like inversion of control: an architecture, not a set of theorems and their proofs. Even Burgess' own book you mentioned is nothing like a mathematical handbook.

I'm a great fan of Mark Burgess and his promise theory, but calling it a mathematical theory or a mathematical domain is simply incorrect.

atsaloli · on May 19, 2018

I hear you.

The book I mentioned (Thinking in Promises) is the introductory-level public book.

https://www.amazon.com/Promise-Theory-Principles-Application... is the heavy-duty scientific stuff.

"In Search of Certainty" (https://www.amazon.com/Search-Certainty-Science-Information-...) is somewhere in-between.

I would say promise theory is its own kind of logic and notation. Thanks for the correction @dozzie.

atsaloli · on May 16, 2018

Actually the sequence was CFEngine, Puppet, Chef.

See http://verticalsysadmin.com/blog/relative-origins-of-cfengin...

dozzie · on May 16, 2018

> [...] the actual change can happen on the "client servers" at any point within next 20 minutes. [...] What if you want to switch over your web servers at a specific moment?

You don't. Configuration management is a wrong operation mode for a synchronous change. Still, you could order all your Puppet agents to run their scheduled operation now instead of leaving it waiting for its time.

nunez · on May 17, 2018

Ansible all of the way. Chef and Puppet have too much overhead in comparison. Ansible is agentless. You can either use a centralized server for deployments or you can have every instance configure itself. Also, Ansible is YAML based, which is a strength and a weakness.

Chef is a runner up. Love their community and Chef is pretty straightforward once you learn the lingo.

Puppet doesn’t really work for modern Git development workflows (Hiera and r10k are duct tape) and testing Puppet is kludgey. Also, most of the docs you’ll find for it stopped getting updated in 2015 or so.

busterarm · on May 17, 2018

I've used chef, ansible and saltstack in small startup and large scale enterprise environments.

Ansible is just about the easiest and most flexible thing going, but once you hit "very large scale" you're going to get bit by its performance and start worrying about when you actually update things. Ansible Tower starts to look good then, but it's not the well-walked path and brings you all sorts of other issues about how you distribute secrets to bootstrap things.

Chef is kind of nice when you don't have a lot of environments that you need to manage and about as flexible as you need it to be in those situations.

SaltStack shines when you really have a lot of heavy lifting to do and the Event System, Beacons and Reactors will honestly blow your mind with the complex things you can achieve in a way that's simple to reason about and maintain.

That said, there's really like 3-4 majorly different ways you can (or would want to) use Salt and understanding it and its documentation is a large cognitive investment. You will likely run into major pain at some point down the road if you choose to use it. I would only use it again if I had a really good reason to -- pretty much if there's no other alternative. I would not at all bother using it to try and do typical sysadmin automation tasks.

Strange side-note: The best managed Salt environments I've worked in or looked at were all masterless, whether at small or massive scale. It's my probably-wrong opinion that traditional master/minion SaltStack is always going to cause you enormous problems eventually when you need to either scale out or pivot on something.