All You Need to Know to Use User Namespaces in Kubernetes

This blog post is part of a series that will deep dive into user-namespaces support in Kubernetes.

User-namespaces (userns) support reached GA in Kubernetes 1.36. This means you can have pods that run inside a user-namespace. The most common reasons people want to do that are:

Improve isolation: Adopting it will significantly increase the host isolation and reduce lateral movement. UIDs/GIDs don’t overlap with any other pod or the host, and capabilities are only valid inside the pod.
Secure nested containers: It’s possible to create a container inside a container with userns, so you can run dockerd inside a Kubernetes pod (with some other adjustments, but all available now), you can build container images, etc.

How to use it

One of the design goals was to make it trivial to adopt. All you need to do is set hostUsers to false in your pod spec. If you have the right versions of the stack, all will just work:

apiVersion: v1
kind: Pod
metadata:
  name: userns
spec:
  hostUsers: false # <-- add this field.
  containers:
  - name: shell
    command: ["sleep", "infinity"]
    image: debian

This tells Kubernetes not to use the host’s UIDs/GIDs, and instead a user namespace is created for the pod. Most applications will work just fine with this, completely unmodified.

Setting that field also takes care of using non-overlapping UIDs/GIDs for a pod’s processes and files. You can use runAsUser and other fields that just affect the user inside the container. Inside the container nothing changes: if you use runAsUser: 0 you will still see that, but from the host point of view (e.g. if you run ps) you will see the pod running as an unprivileged user.

When using volumes, the files created there will belong to the user you choose to run the container as. For example, if you use runAsUser: 0, it will create files owned by root in the volume. This means you can easily share volumes with containers that are not running with user-namespaces too.

Let’s get practical with some questions and answers:

1. Can I set hostUsers: false in pods with volumes and then remove it if I see some problems?

Yes, you can turn it on and off — everything, including volumes, will continue to work just fine.

2. Is there any consideration to have with the file-systems used?

Yes, the file-systems used by the pod need to support idmap mounts on the kernel you are using. Support for idmap mounts is per file-system and kernel version. You can check which Linux version added support for each fs in the NOTES section for the mount_setattr manpage.

While we try to update the manpage, it is sometimes out of date. You can clone the Linux repo and grep for FS_ALLOW_IDMAP in the fs/ folder for an authoritative list.

Bear in mind that the service account token each pod has by default is usually a tmpfs file-system. Also, because files like /etc/resolv.conf and similar are bind-mounted from /var/lib/kubelet/..., you need idmap mounts support on that fs too.

3. Is there anything else I need to take into account to use this?

If you meet the stack requirements and have fs support, you are almost there. There are a few other things to take into account, but are probably a no-op for most apps:

Running inside a userns makes some operations completely impossible (like loading kernel modules). But unless you need to do something very privileged on the host, you can probably enable userns and just use it without any other changes.
The container must use UIDs/GIDs from the range 0-65535 for processes and files.

4. Are there some PSS changes when using userns?

Yes, check out the docs for this. Basically, the PSS checks for “does this pod run as root?” are relaxed. You can run as root inside the container and the PSS won’t complain if you are using userns.

Please note that while capabilities are also namespaced (valid inside only the pod userns, not the host), they are not relaxed. Capabilities do allow you to do more operations and potentially hit some kernel CVEs, that is why we decided to keep them unchanged with userns.

In other words, if you want CAP_SYS_ADMIN, even if it’s much safer to grant it with userns, the pod still needs to be privileged to get it.

5. How can I change the kubelet configuration for userns on running nodes?

If you want to change which UID/GID range is used for pods with userns, or how many IDs are allocated per pod, you need to drain the node first.

The kubelet guarantees that no two pods with userns use the same range and when these settings change, the pods need to be recreated so the kubelet can honor the new setting and guarantee no two pods overlap.

Stack requirements

To make a feature that significantly improves the isolation and is so simple to adopt, we needed to make changes in every layer of the stack: the Linux kernel, OCI runtimes (runc, crun), high-level container runtimes (containerd, cri-o) and Kubernetes.

Component	Version	Notes
Kubernetes	1.25	Stateless pods support, requires to enable alpha feature gate `UserNamespacesStatelessSupport`
Kubernetes	1.28	Stateless and stateful pods support, requires to enable alpha feature gate `UserNamespacesSupport`
Kubernetes	1.30	Beta, requires to enable beta feature gate `UserNamespacesSupport`
Kubernetes	1.33	Beta, enabled by default, no need to enable feature gate
Kubernetes	1.36	GA, no need to have beta features enabled
containerd	2.0	Needed for k8s >=1.27. v1.7 has limited support, works only with Kubernetes 1.25–1.26
CRI-O	1.25	Supports all features of the same Kubernetes version
runc	1.2	Support for idmap mounts
crun	1.9	1.13+ recommended for better error messages
Linux	6.3	Most popular file-systems support idmap mounts. With care, you can also use 5.19 and 5.12

How does this compare with what we have today?

It’s quite different from what we have today, in several dimensions — even compared to “regular unprivileged pods” (pods without user-namespaces that run as an unprivileged user, with restricted capabilities).

Pods with seccomp/apparmor

If we compare with pods just using seccomp/apparmor to secure them, it’s quite different:

The pod still runs as root on the host and all capabilities are valid in the host
We use seccomp/apparmor to limit what an already very privileged pod can do

But… scenarios like container breakouts can still have a very big impact.

Unprivileged pods (without userns)

Running as an unprivileged user is a significant improvement over running as root. However, it’s still different from userns:

It’s harder than it seems: not even the regular “nginx” image works unprivileged. Google engineers had a Kubecon talk about all the problems they faced trying to adopt it for GKE components. They ended up choosing userns instead.
Capabilities are still valid on the host: any capabilities you grant are usable after a container breakout. With userns, capabilities are only valid inside the pod.
No lateral movement protection: most people pick the same UID (e.g. 65534), so all unprivileged pods share it. With userns, each pod gets a unique range on the host.

Let’s think about it for a moment

Running processes as different UIDs/GIDs is probably one of the most basic security measures we can take. Yet in the container world, we run as root on the host, giving a lot of privileges, and then trying to restrict what root can do with seccomp/apparmor/others. It’s like playing whack-a-mole, it’s not that hard to find an escape as root! A LOT of CVEs rated HIGH happened because of running as root.

User-namespaces allow us to change this and instead of giving permissions to processes that shouldn’t have them, it just doesn’t give them permissions. Or it gives permissions just within the container. That is exactly what we want.

Linux distros learned this lesson years ago — services like bind don’t run as root. When containers started we moved fast and left some of those lessons behind. I don’t want to criticize the past, we had reasons to do that. But it’s time to revisit those decisions.

A personal note

I’m honestly super-happy that finally userns is a GA feature in Kubernetes. I’ve been working on this for the last 6 years. I can’t believe it’s finally there and it’s available in all major clouds!

Conclusion

I’ve shared, I hope, everything you need to know to use user-namespaces in Kubernetes and how userns compares to what was already available.

If you got curious about how pods that run with non-overlapping UIDs can still share volumes without issues, check out the next post. I’ll explore the problems and solutions we had at all layers of the stack.

How to use it#

Stack requirements#

How does this compare with what we have today?#

Pods with seccomp/apparmor#

Unprivileged pods (without userns)#

Let’s think about it for a moment#

A personal note#

Conclusion#