Landlock Multithreaded Enforcement

V2: https://lore.kernel.org/all/20251001111807.18902-1-gnoack@google.com/

The LandlockRestrictSelf system call unfortunately only applies to a single thread.

While this is OK for classic single-threaded UNIX programs, it is not sufficient in environments that are inherently multi-threaded, like the Go runtime or multithreaded server frameworks.

The Landlock TSYNC patch introduces a flag to LandlockRestrictSelf which applies a given Landlock ruleset to the entire process.

Struct Cred

Landlock domains are stored in a struct cred.

A task’s struct cred can only be updated by the task itself, and it needs to go through a “transactional” API to do it, see LinuxCredentials.

We therefore schedule a pseudo-signal task_work for each affected sibling thread, so that each affected thread updates the struct cred itself.

Discovery of sibling threads and scheduling task work

The approach is to:

(under a RCU read lock) loop over the sibling threads and see if there are still ones which are unknown
allocate sufficient space to schedule the task work for these newly found threads
(under a RCU read lock) call task_work_add to add a pseudo-signal handler for each task, using the previously allocated space
Repeat from 1 until no more sibling threads are found

This is needed due to the following constraints:

We need an RCU read lock to iterate over the sibling threads
Each sibling task needs its own struct task_work to call task_work_add() with
- Allocation may not happen under a RCU read lock
We will find all sibling threads in step 1, but there is a possible race condition if these spawn new threads before our pseudo-signal work runs on the thread. This is unlikely, but to rule out the possibility, we loop until no additional sibling threads are found.

Coordination between threads

All sibling threads are running their credentials updates in lockstep in a “task work” pseudo-signal handler.

The coordination between all threads is done through the shared context struct (see above).

We make use of the fact that in the LinuxCredentials API, only the “prepare” step may fail. From the view of a sibling thread, the following happens:

Execute prepare_creds()
- If this fails, write back the error to a shared location.
Notify the calling thread that the prepare_creds() step is done.
Wait for the notification from the calling thread that it’s time to continue.
Check the shared error location and then either commit or abort.
Notify the calling thread that we are done.

Because the commit step is only done when all possible error conditions are ruled out, we get “all or nothing” semantics: Either all threads succeed and commit, or they all abort.