Modernization of Linux proc filesystem and containers security

TL;DR: The Linux kernel procfs suffers from a historical design that prevents having multiple separate procfs instances inside the same PID namespace. All the mounts are a mirror of the internal one. This blocks developement of Linux containers, sandboxes, and other security related features.

Patch solution: PATCH RFC v3 proc: modernize proc to support multiple private instances

Problem

Linux containers and other sandbox mechanisms want to hide processes, files and directories of procfs, other implementations want to restrict some procfs features. To achieve this, some solutions try to mount innaccassible files over important ones, but this process is limited by default, as procfs entries are generally dynamic, it is hard to track all files as new features and their access are being merged. The last blocker is how procfs is handled internally.

Historically, Linux procfs was always tied to pid namespaces, during pid namespace creation we internally create a procfs mount for it. However, this has the effect that all new procfs mounts are just a mirror of the internal one, any change, any mount option update, any new future introduction will propagate to all other procfs mounts that are in the same pid namespace, which may disable any security mount related option.

For a detailed description please see: PATCH RFC v3 proc: modernize proc to support multiple private instances

Solution

I have been working on a patchset to improve Linux procfs internal implementation, so we can have private procfs instances per the same PID namespace. This will improve Linux containers security in general, as it allows to clean proc internals, and allow to support better mount options like hidepid and new ones.

Update

As of April 2020 Work has been merged and now procfs supports: Per procfs instance mount options: hidepid and subset

Thanks to Alexey Gladkov and other kernel developers for keeping up the work, it took +3 years to merge the patches and have a modern proc filesystem.