Docker Security Cheat Sheet

terom · on March 13, 2021

A big :+1: for running Docker containers with `--read-only`, forcing you to use explicit writeable volume/bind mounts for all writable data... it's not just the security benefits, you can also avoid entire classes of problems like:

* minimizing the difference between `docker restart` (preserves overlayfs changes) and container re-creates (resets overlayfs back to the image state)

* surprise data loss on container redeployments because data was unexpectedly being written to the overlayfs instead of a volume

* unexpectedly running out of disk space in `/var/lib/docker` because data was being written outside of a volume

* performance issues caused by excessive overlayfs writes (storage drivers and /var/lib/docker not necessarily designed for IO performance)

corytheboyd · on March 13, 2021

I was just thinking the other day about how best to manage third party containers that write various logs, preventing them from filling up disk space over time and crashing. This is a perfect solution, assuming the third party container is good about telling you every volume it needs write access to, etc.

qbasic_forever · on March 13, 2021

If you're running into containers that are writing logs to files, IMHO that's a code smell and something is off with how they're using containers. Most of the container ecosystem has settled on stdout and stderr as the defacto log locations and then leave it up to the container orchestrator (docker-compose, k8s, etc.) to deal with the log data. There might be some good reasons to use files for logs, but in 2021 they're likely exceptions and not the norm.

nrvn · on March 13, 2021

I am a big fan of read-only rootfs. But for some third party containers it may be challenging to apply this. In most cases it is not a big deal but sometimes you end up with a bunch of tmpfs mounts for those paths where it is required.

de6u99er · on March 14, 2021

I never understood why read-only isn't the default option for volume mounts.

scaladev · on March 13, 2021

All of these isolation techniques (and much more) can be used inside systemd units. Write your .service as usual, then run this command on it:

  $ systemd-analyze security service-name

It prints out a long list of hardening flags that can be applied inside of your service, like so:

    NAME                                                        DESCRIPTION                                                             EXPOSURE
   PrivateNetwork=                                             Service has access to the host's network                                     0.5
   User=/DynamicUser=                                          Service runs as root user                                                    0.4
   CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP)                Service may change UID/GID identities/capabilities                           0.3
   CapabilityBoundingSet=~CAP_SYS_ADMIN                        Service has administrator privileges                                         0.3
  ...

Here's what I typically use for a .NET 5 application:

  WorkingDirectory        = /opt/appname/app
  ReadWritePaths          = /opt/appname/data
  UMask                   = 0077
  LockPersonality         = yes
  NoNewPrivileges         = yes
  PrivateDevices          = yes
  PrivateMounts           = yes
  PrivateTmp              = yes
  PrivateUsers            = yes
  ProtectClock            = yes
  ProtectControlGroups    = yes
  ProtectHome             = yes
  ProtectHostname         = yes
  ProtectKernelLogs       = yes
  ProtectKernelModules    = yes
  ProtectKernelTunables   = yes
  ProtectSystem           = strict
  RemoveIPC               = yes
  RestrictAddressFamilies = AF_UNIX AF_INET AF_INET6
  RestrictNamespaces      = yes
  RestrictRealtime        = yes
  RestrictSUIDSGID        = yes
  SystemCallArchitectures = native
  ProtectProc             = invisible
  CapabilityBoundingSet   =
  SystemCallFilter        = ~@clock @module @mount @raw-io @reboot @swap @privileged @cpu-emulation @obsolete

ReadWritePaths should be replaced with a combination of DynamicUser + writing local persistent data to $STATE_DIRECTORY, but I'm too lazy to do that yet.

See systemd.exec(5) for more.

eeZah7Ux · on March 13, 2021

This is excellent advice. Together with systemd-nspawn it's all one needs to run a container in a strong sandbox.

t0astbread · on March 13, 2021

Maybe a bit off-topic on this article but what does systemd-nspawn add compared to the aforementioned isolation options and how can you combine the two?

eeZah7Ux · on March 14, 2021

systemd-nspawn does the same things as docker, but with much smaller attack surface and can be sandboxed very effectively by the unit files.

The isolation provided by unit files is orthogonal with running containers.

nspawn is not even running a dedicated daemon. Plus, it's no secret that docker was not designed with security in mind and its isolation is bolted on. [1]

Furthermore, systemd is already installed and running on most systems (like it or not)

[1] https://www.cvedetails.com/vendor/13534/Docker.html

nikisweeting · on March 13, 2021

What's the systemd equivalent to docker-compose? The real value-add of Docker imo is not the security, it's the easy packaging, distribution, and running of otherwise complicated apps.

dundarious · on March 13, 2021

Writing N systemd service files with After=, etc., to match your docker-compose.yml that has N services in it.

I have ported a few docker-compose.yml-s this way, because I don’t understand docker enough to troubleshoot issues with getting my firewall rules to apply to docker traffic. Dependency hell is not an issue for these projects, so I feel happier with systemd.

totony · on March 14, 2021

At this point isn't it better to just run it inside a container like lxc?

NewJazz · on March 13, 2021

Well sure, but I'm not using systemd. If I wanted a container runtime written in C, I would use crun. But I'd rather not use that either.

goatinaboat · on March 13, 2021

Well sure, but I'm not using systemd.

I hear you but fighting systemd in 2021 is like pushing water uphill. With a fork.

NewJazz · on March 14, 2021

I'm not fighting anything. Just don't use it. Easy as that. I don't fudge with configs, carefully read manpages like systemd.exec, or put any effort into how I keep services running on my personal devices.

hivacruz · on March 13, 2021

I use Hadolint[1] as a CI job to check if my Dockerfiles follow the good "rules". But there is one rule that annoys me the most and which is also present in this article, is the pinned OS package version rule[2]. While I understand its interest, I struggle to handle this problem.

When I build new images and it fails because the pinned version is not available anymore, I have to dig into Debian or Ubuntu packages websites to find the new ones as they don't keep the old packages online.

I know I could ask Hadolint to ignore this rule but I don't like this and I think it's important to stick to a certain version of a package to avoid problems. I'm just trying to find any tip that could make me use pinned version and avoid this manual search every time I have to. Does apt-get install allows wildcard for example?

1: https://github.com/hadolint/hadolint

2: https://github.com/hadolint/hadolint/wiki/DL3008

forty · on March 13, 2021

You could use renovate [1] to watch your repository and open pr for you when there is a new version. I don't know if it supports watching debian repositories out of the box, but that's probably doable with some tweaking

[1] https://github.com/renovatebot/renovate

nicbou · on March 14, 2021

I had a pretty serious issue because I didn't pin versions [1] and countless small problems.

[1] https://nicolasbouliane.com/blog/nextcloud-docker-upgrade-er...

jose_zap · on March 14, 2021

One of hadolint authors here.

You can do `apt-get install my_package==4.1.*`, for instance and that will pass the validation and still be close to the ideal of having reproducible builds.

hivacruz · on March 16, 2021

Awesome, thanks your answer and for this great tool!

yrro · on March 13, 2021

snapshot.debian.org?

teddyh · on March 13, 2021

http://archive.debian.org/debian/

nrvn · on March 13, 2021

CIS Docker benchmark is a very extensive rule set for assessing docker host, daemon, images and containers from the security perspective.

It comes with a very handy tool as well https://github.com/docker/docker-bench-security

Layke1123 · on March 13, 2021

Yet another list of potentially extremely useful concepts that I'll never actually have the time to fully verify.

I wonder where all this complexity ends. If a human can't fully grok the systems we work on, then there is no way we can hope to not be misled and taken advantage of. Does anyone else share these concerns?

deadbytes · on March 13, 2021

A large majority of people in tech seem to have convinced themselves that you solve complexity by adding even more complexity ontop.

Yes, linux security is complicated. So why not work on improving this instead? Optimise and simplify what we already have.

Docker hasn't made the complexity of linux security go away. It has just added a whole other dimension of potential security issues people now need to manage in addition to the security of the base system.

Programmers need to shift their mindset. We already have far too much complexity. Stop thinking about what new things you can create, start thinking about how you can improve and simplify the software that we already have.

smw · on March 13, 2021

It has gotten simpler, especially for things like testing safely. If you don't think about it as docker, or containers, the ability to sandbox things with namespaces and cgroups is almost magical. It's effectively instant and much more effective than just chroot.

"Run something with no -- or very specific -- network access" was a really annoying problem to solve (LD_PRELOAD?) in the before times.

deadbytes · on March 14, 2021

The abstraction that docker provides you is simpler, but the whole system has gotten more complex.

>Run something with no, or very specific, network access

If this was such a problem in linux then why didn't people focus on improving this instead? We could have solved this problem in linux and made the whole system better for everyone.

Instead people left the problem there and just piled more crud ontop. We added an entire new layer of abstraction that everyone now has to spend weeks learning how to use instead of just fixing the original problem. The whole system is now far more complex, and the original problem is still there.

paulryanrogers · on March 13, 2021

Docker solves a few other problems too, like conflicting dependencies in user space. Still, I agree the broader point that one can't solve too much abstraction with more abstraction.

bob1029 · on March 13, 2021

I absolutely share these concerns.

The art of software engineering is about managing complexity. The best way to manage something is to reduce the amount of it you have to worry about. For the rest, paying really close attention, accepting ownership and directly engaging is key. Reducing the number of parties you have to trust is typically a happy side-effect of reducing complexity.

I look at containerization as an attempt to hand-wave away critical responsibilities around ownership of complexity in an application. There are tons of examples that illustrate this throughout the ecosystem, but I think this one is most apt -

One day you discover your application is a difficult mess to reconstruct from source each time. You have reached a fork in the road because management is complaining that it takes 2 weeks to configure a new QA environment from scratch. Do you either:

A) Review fundamental assumptions about the problem domain, technology choices and teamwork. Potentially consider rewriting your application from scratch using fewer tools & computers, and with more focus on driving the actual business value equations.

Or,

B) Decide that Tom's computer is the new golden image of production and bless it as such. Now let's find a way to manage a whole farm of these things!

geofft · on March 13, 2021

The actual answer people come up with is neither of those. The process of moving your application to run in containers generally doesn't involve snapshotting Tom's computer - it usually wouldn't even work inside Docker for myriad reasons.

And the Dockerfile format is a good way of capturing what it actually takes to reconstruct your application/environment from source in a manageable, version-control-able text file.

I do agree that the rush to containerization is generally a rush to draw expansive abstraction boundaries, but this particular story isn't what people are doing. (At least not with containers - it's certainly what people were doing with VM images ten years ago!)

3np · on March 13, 2021

As a docker user, most of this aligns with my understanding, it’s mostly straight forward stuff and not inherently complex.

Try SELinux and see if you think this is still complicated ;)

ADD vs COPY was new to me, though.

...More in general, I think it’s not the systems that are complex but our minds/brains that are limited to deal with it. Just look at the complexity in nature.

nikisweeting · on March 13, 2021

As a sysadmin that that worked on securing linux apps before docker was a thing, imo doing it with docker is a lot less complex than before.

No tech is perfect, but docker has dramatically reduced the surface area of tunables that my application devs have to care about in order to get our product shipped. Even if it's not brought it down to zero, it's a step in the right direction in terms of UX complexity exposed to the developer.

the8472 · on March 13, 2021

As always, it depends?

If you're only using docker for development to bundle things in a different OS base image then there's not that much need for paranoia. You probably would trust the other OS vendor with your host system too. Running docker without security may be perfectly fine in that case.

For random scripts from github running them in docker is mostly a precaution so they don't screw up your host system or some malicious dependency makes a low-effort attempt at exfiltrating your ~/.ssh dir or whatever. In that case it makes sense to understand the basics, e.g. not just running docker --privileged willy-nilly just because some README asks you to.

If you're running some PaaS container service where users can execute arbitrary code on shared hardware then you better have a full-time security engineer or two who will be paid to understanding all the details involved.

closeparen · on March 13, 2021

I think what you should take away from this is that containers are not the abstraction you want for running mutually distrusting workloads, and you can continue to use separate VMs or even separate physical networks of bare metal servers. Some people will geek out about this quixotic undertaking; you don’t have to be one of them.

Linux is not very good at security boundaries anyway, just run one thing in each VM and don’t leave anything else there to privilege-escalate into.

ButterWashed · on March 13, 2021

I feel this way every time I read an OpenSCAP or CIS lockdown policy. They're hundreds of pages of "recommendations" that I think I can trust but don't fully understand. I don't know if I should worry about my lack of understanding or not.

blowski · on March 13, 2021

My Grandad was an illiterate farmer. He didn’t understand the technology he used either, but it didn’t matter as long as it worked or he could call someone to fix it. Somebody who lives 100% this stuff will abstract it and solve the problem for you, just as others did then.

But do consider the list when some junior says they can get the company billing system running on Docker on prod.

eeZah7Ux · on March 13, 2021

Yes. Stay away from Docker or anything so complex. systemd-nspawn does the same things and can be sandboxed very effectively by the unit files.

_ktx2 · on March 13, 2021

Rule #2 needs lots of nuance stacked on it.

Host-users and guest-users must be explicitly mapped by whomever starts the container, so this issue is not a security threat to outside of the container. That said, if a guest-os is running as root and then someone compromises it, they have at their disposal all the powers of root in that container.

OWASP has also mixed good container guidance with "Docker" as well as some Docker specific things, which I'd like to maintain a delineation with now and into the future because they are different. I understand most uneducated people are looking for "how do I do x in Docker" but subtly educating the user matters for broader discourse.

rzimmerman · on March 13, 2021

“Don’t run as root in the container” definitely gets parroted without the nuance you describe. Gaining root in a container does not generally mean anything has been compromised on the host. Unless you do something odd (like mounting the host docker socket, weird setuid stuff, or run as privileged) it should be fine to run the container processes as root. It’s kind of the main point of running a process in a docker container.

I understand the motivation of marking scripts and binaries as non-writable inside the container as an extra layer of assurance (along with a non-root user that can only execute). But it’s a disservice to developers if you don’t explain why. A lot of people walk away from this thinking they’re protecting the host OS and wind up cargo-cult-creating a container user with full write/execute permissions.

GordonS · on March 14, 2021

An option not given in the article is to start as root, then step down to an unprivileged user using `gosu`. The official Postgres image does this, so appropriate permissions can be set on the data dir, becoming an unprivileged user for operation.

stunt · on March 13, 2021

Quick tips to find out if your any of the tests in your project are actually making outbound calls: Run it without networking (--network=none)

rkeene2 · on March 13, 2021

You can also do this without Docker. I do this as part of my build system (which is just a bash script) for a Linux distribution. Each package has a download phase, which permits network access. After that phase completes, network access is dropped.

https://chiselapp.com/user/rkeene/repository/bash-drop-netwo...

SkipperCat · on March 13, 2021

Good article for some basic rules to keep your containers safe. That being said, I violate rule #1 all the time. If you want to have containers that launch other containers (we do that for Apache Airflow), you don't have much other choice. Also, cAdvisor requires that mount for monitoring.

Of course, if you're running containers in a ephemeral VM, no biggie.

lazyant · on March 14, 2021

Towards the end (Rule #11) "Avoid the use of apt/apk upgrade". Why? (assuming base OS image version is pinned)

pojzon · on March 13, 2021

Im surprised noone commented that when someone is able to get to your runtime its already too late to do anything.

If your docker is running in a private network and someone got in, it already means your whole system is compromised.

If you exposed application within a docker that has code0 vulnerability, it really does not matter, your system is already compromised..

edoceo · on March 14, 2021

Sure. I still try to put in layers of security anyway. Rather than just: fuck it.

de6u99er · on March 14, 2021

Rule #12

Know what you are doing!

Don't just run an unknown public container without checking it's Dockerfile first. If you don't understand what's happening, chances are high that you don't know what you're doing.

kayson · on March 13, 2021

This seems a little out-dated because --link is now a legacy feature: https://docs.docker.com/network/links/

Correct me if I'm wrong, but by default, containers can't communicate with each other even if ICC isn't disabled because the daemon gives them unique "default" networks. Only if you specify the same network for different containers can they communicate...

Edit: this behavior is specific to docker-compose. If you do docker run without specifying a network, it does use the docker0 bridge.

imwillofficial · on March 13, 2021

Everyone running docker in their homelab should be reading this.