Landlock Multithreaded Enforcement
V2: https://lore.kernel.org/all/20251001111807.18902-1-gnoack@google.com/
The LandlockRestrictSelf system call unfortunately only applies to a single thread.
While this is OK for classic single-threaded UNIX programs, it is not sufficient in environments that are inherently multi-threaded, like the Go runtime or multithreaded server frameworks.
The Landlock TSYNC patch introduces a flag to LandlockRestrictSelf which applies a given Landlock ruleset to the entire process.
Struct Cred
Landlock domains are stored in a struct cred
.
A task’s struct cred
can only be updated by the task itself,
and it needs to go through a “transactional” API to do it, see LinuxCredentials.
We therefore schedule a pseudo-signal task_work
for each affected sibling thread,
so that each affected thread updates the struct cred
itself.
Discovery of sibling threads and scheduling task work
The approach is to:
- (under a RCU read lock) loop over the sibling threads and see if there are still ones which are unknown
- allocate sufficient space to schedule the task work for these newly found threads
- (under a RCU read lock) call
task_work_add
to add a pseudo-signal handler for each task, using the previously allocated space - Repeat from 1 until no more sibling threads are found
This is needed due to the following constraints:
- We need an RCU read lock to iterate over the sibling threads
- Each sibling task needs its own
struct task_work
to calltask_work_add()
with- Allocation may not happen under a RCU read lock
- We will find all sibling threads in step 1, but there is a possible race condition if these spawn new threads before our pseudo-signal work runs on the thread. This is unlikely, but to rule out the possibility, we loop until no additional sibling threads are found.
Coordination between threads
All sibling threads are running their credentials updates in lockstep in a “task work” pseudo-signal handler.
The coordination between all threads is done through the shared context struct (see above).
We make use of the fact that in the LinuxCredentials API, only the “prepare” step may fail. From the view of a sibling thread, the following happens:
- Execute
prepare_creds()
- If this fails, write back the error to a shared location.
- Notify the calling thread that the
prepare_creds()
step is done. - Wait for the notification from the calling thread that it’s time to continue.
- Check the shared error location and then either commit or abort.
- Notify the calling thread that we are done.
Because the commit step is only done when all possible error conditions are ruled out, we get “all or nothing” semantics: Either all threads succeed and commit, or they all abort.