🛠 Docker Level Security
Let's talk more about Docker and take a closer look at namespaces, cgroups, chroot, capabilities, seccomp.
Understanding what's going on under the hood and how it works helps build a healthy safety model that allows assembly control and process separation, which solves huge operational problems.
Therefore, we will look at what you should pay attention to first.
• FS: chroot and mount
Inside the container, processes see the root as the result of mount and chroot operations. Therefore, you should not mount sensitive directories (/var/run/docker.sock, /etc, /var/lib/docker) inside working containers. Moreover, for stateful services use clearly designated volumes.
# Look at the mount-namespace of the container
pid=$(docker inspect -f '{{.State.Pid}}' my-container)
lsns -t mnt -p "$pid"
• Namespaces: process, network and hostname isolation
Let me remind you that a container is a set of namespaces: PID, NET, MNT, UTS, USER. They are responsible for what processes you see, what mount points are available, etc. Therefore, never use --pid=host and --network=host in regular services, because this is an isolation anti-pattern. For debugging, you should use nsenter rather than forwarding services directly to the host.
# Compare the namespace of the container with the host
pid=$(docker inspect -f '{{.State.Pid}}' ns-demo)
lsns -p "$pid"
# Enter the namespace of the container for debugging
nsenter -t "$pid" -n ip addr
• cgroups limits
They control how much CPU/memory a group of processes can consume. Docker creates groups for each container automatically, but limits must be set manually. Therefore, for all combat services it is necessary to set the --cpus and -m limits, otherwise one runaway process kills the entire node. In monitoring, keep an eye on OOMKilled and CPU throttling and if exceeded, you need to review the limits.
# Limit container by resources
docker run -d --name web \
--cpus="1.0" \
-m 512m --memory-reservation=256m \
nginx:stable
# View consumption in real time
docker stats web
• Capabilities
By default, Docker truncates the container's capabilities and only grants some of the kernel privileges. Therefore, you should not use --privileged in production under any circumstances, as well as the starting formula: --cap-drop=ALL and point --cap-add only of what the service really does not work without, as an example - NET_BIND_SERVICE for ports 80/443.
# View set of capabilities
docker run --rm -it alpine sh
apk add libcap && capsh --print | grep cap_
# Drop everything, add only what you need manually
docker run --rm -it \
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \
-p 80:80 nginx:stable
• Seccomp and AppArmor
Even when an attacker is inside the container, the kernel can prohibit dangerous syscalls or file operations via seccomp/AppArmor. Docker uses a default profile, but for sensitive services it needs to be tailored to a specific application. I recommend turning on profile violation logging to see escalation attempts, etc. before they become an exploit.
# Run with custom profile
docker run --rm -it \
--security-opt seccomp=/opt/seccomp/web.json \
alpine sh
#AppArmor
docker run --rm -it \
--security-opt apparmor=docker-default \
alpine sh
• Docker daemon and docker.sock
docker.sock is like root on the host. Anyone with access to the socket can launch a privileged container and reach the host. Therefore you should:
• do not raise -H tcp://0.0.0.0:2375 without TLS and mTLS
• do not mount /var/run/docker.sock in working containers
• for CI runners use a separate node with separate rules
• grant access to docker groups only with Segregation-of-Duties control
# Who has access to the socket
ls -l /var/run/docker.sock
getent group docker
# Minimum systemd config: only unix socket
ExecStart=/usr/bin/dockerd -H unix:///var/run/docker.sock
#appsec #devsecops #reco #specialty #containersecurity
