Prevent Overlayfs Privilege Escalation on Ubuntu Kernels with Yaml (bpf)!

Past week Wiz blogged about Overlayfs bugs that can be abused on Ubuntu Kernels to perform privilege escalation, named GameOver(lay) read more here.

Those are CVE-2023-2640 and CVE-2023-32629, also Datadog has another writeup about a previous Overlayfs upstream vulnerability CVE-2023-0386.


All these are mostly the same: Overlayfs can be mounted inside a user namespace, but the security checks and value validation when copying up files and their metadata from the lower to the upper layer inside the user namespace were not sufficient. As software evolves, such things can happen.

This allows unprivileged users to abuse Overlayfs: by creating a user namespace, gain certain capabilities, mount the overlayfs file system, change file attributes of binaries, initiate the copy up, and finaly execute from the initial user namespace to gain full privileges.

Unprivileged users are able to create binaries with file capabilities, like this:

1$ getcap upper/binary
2upper/binary cap_setuid,cap_sys_admin=eip

Executing this turns into full privilege escalation. Only some Ubuntu kernels are affected by those.

Tetragon Patch your kernel with Yaml

To prevent those privilege escalations, users can restrict usage of user namespace as linked in the Ubuntu CVE-2023-2640.

I have been working with Isovalent on cloud security and Tetragon for more than a year now, and I wanted to share what the team is building.

Tetragon has cool features that allow to transform Yaml files into BPF programs. To demo how this is useful, let's apply it to protect our Ubuntu kernel (5.19.0-46-generic) by patching it with Yaml files!

We use the following Tracing Policy prevent CVE-2023-2640, and to read more about tracing policies format: here.

Test it:

  1. Copy Tracing Policy prevent CVE-2023-2640 into /etc/tetragon/

  2. Use latest version of unstable container as some features are still new but bind mount the host /etc/tetragon directory into container /etc/tetragon. Add this volume to docker command -v /etc/tetragon:/etc/tetragon

  3. Observe Tetragon events on another terminal with tetra cli:

    1docker exec -it tetragon tetra getevents | tee events.json

Then if we run the Overlayfs exploit, Tetragon will prevent Overlayfs from copying up the capabilities xattr section of the file. The binary will be copied up successfully but copying up capabilities will be turned to a nop.


1$ getcap upper/binary

No file capabilities on the binary.

Tetra cli will produce the following event:

 2  "process_kprobe": {
 3    "process": {
 4      "exec_id": "OjQ1ODUxODM5NTUxMzY5Ojk4MjMw",
 5      "pid": 98230,
 6      "uid": 1000,
 7      "cwd": "/home/tixxdz/overlayfs",
 8      "binary": "/usr/bin/touch",
 9      "arguments": "mount/binary",
10      "flags": "execve clone",
11      "start_time": "2023-08-02T20:05:21.429903780Z",
12      "auid": 1000,
13      "parent_exec_id": "OjQ1ODUxODMyMzk4NDc0Ojk4MjI1",
14      "refcnt": 1,
15      "tid": 98230
16    },
17    "parent": {
18      "exec_id": "OjQ1ODUxODMyMzk4NDc0Ojk4MjI1",
19      "pid": 98225,
20      "uid": 1000,
21      "cwd": "/home/tixxdz/overlayfs",
22      "binary": "/usr/bin/sh",
23      "flags": "execve",
24      "start_time": "2023-08-02T20:05:21.422752839Z",
25      "auid": 1000,
26      "parent_exec_id": "OjQ1ODUxODMwMTExNTgzOjk4MjI1",
27      "tid": 98225
28    },
29    "function_name": "security_inode_copy_up_xattr",
30    "args": [
31      {
32        "string_arg": "security.capability"
33      }
34    ],
35    "return": {
36      "int_arg": 1
37    },
38    "action": "KPROBE_ACTION_OVERRIDE"
39  },
40  "time": "2023-08-02T20:05:21.446089792Z"

Copying up the security.capability xattr is ignored inside ovl_copy_xattr as we override the security hook. This requires the kernel config CONFIG_BPF_KPROBE_OVERRIDE=y.

It is also possible to prevent this class of exploits by using the kill unprivileged user namespace tracing policy, and even apply it to some specific pods and containers.


Tetragon soon to be stable, allows to patch your kernel with Yaml files, it is K8s aware, with cool features that provides Cloud Native observability and enforcement. It supports Applying K8s namespace and Pod label filtering to sandbox some pods and containers without affecting the rest of the system.

Please watch Isovalent blog as more Tetragon details and features will be published next weeks and months.

Thanks to the team for these great features and users for the feedback.

To request Isovalent Cilium Enterprise demos: Isovalent request demo