SandBox Mode¶
Introduction¶
The primary goal of SandBox Mode (SBM) is to reduce the impact of potential memory safety bugs in kernel code by decomposing the kernel. The SBM API allows to run each component inside an isolated execution environment. In particular, memory areas used as input and/or output are isolated from the rest of the kernel and surrounded by guard pages. Without arch hooks, this common base provides weak isolation.
On architectures which implement the necessary arch hooks, SandBox Mode
leverages hardware paging facilities and CPU privilege levels to enforce the
use of only these predefined memory areas. With arch support, SBM can also
recover from protection violations. This means that SBM forcibly terminates
the sandbox and returns an error code (e.g. -EFAULT
) to the caller, so
execution can continue. Such implementation provides strong isolation.
A target function in a sandbox communicates with the rest of the kernel through a caller-defined interface, comprising read-only buffers (input), read-write buffers (output) and the return value. The caller can explicitly share other data with the sandbox, but doing so may reduce isolation strength.
Protection of sensitive kernel data is currently out of scope. SandBox Mode is meant to run kernel code which would otherwise have full access to all system resources. SBM allows to impose a scoped access control policy on which resources are available to the sandbox. That said, protection of sensitive data is foreseen as a future goal, and that's why the API is designed to control not only memory writes but also memory reads.
The expected use case for SandBox Mode is parsing data from untrusted sources, especially if the parsing cannot be reasonably done by a user mode helper. Keep in mind that a sandbox doesn't guarantee that the output data is correct. The result may be corrupt (e.g. as a result of an exploited bug) and where applicable, it should be sanitized before further use.
Using SandBox Mode¶
SandBox Mode is an optional feature, enabled with CONFIG_SANDBOX_MODE
.
However, the SBM API is always defined regardless of the kernel configuration.
It will call a function with the best available isolation, which is:
strong isolation if both
CONFIG_SANDBOX_MODE
andCONFIG_ARCH_HAVE_SBM
are set,weak isolation if
CONFIG_SANDBOX_MODE
is set, butCONFIG_ARCH_HAVE_SBM
is unset,no isolation if
CONFIG_SANDBOX_MODE
is unset.
Code which cannot safely run with no isolation should depend on the relevant config option(s).
The API can be used like this:
#include <linux/sbm.h>
/* Function to be executed in a sandbox. */
static SBM_DEFINE_FUNC(my_func, const struct my_input *, in,
struct my_output *, out)
{
/* Read from in, write to out. */
return 0;
}
int caller(...)
{
/* Declare a SBM instance. */
struct sbm sbm;
/* Initialize SBM instance. */
sbm_init(&sbm);
/* Execute my_func() using the SBM instance. */
err = sbm_call(&sbm, my_func,
SBM_COPY_IN(&sbm, input, in_size),
SBM_COPY_OUT(&sbm, output, out_size));
/* Clean up. */
sbm_destroy(&sbm);
The return type of a sandbox mode function is always int
. The return value
is zero on success and negative on error. That's because the SBM helpers
return an error code (such as -ENOMEM
) if the call cannot be performed.
If sbm_call()
returns an error, you can use sbm_error()
to decide whether the
error was returned by the target function or because sandbox mode was aborted
(or failed to run entirely).
Public API¶
-
struct sbm¶
SandBox Mode instance.
Definition:
struct sbm {
#ifdef CONFIG_SANDBOX_MODE;
int error;
void *private;
struct sbm_buf *input;
struct sbm_buf *output;
struct sbm_buf *io;
#endif;
};
Members
error
Error code. Initialized to zero by
sbm_init()
and updated when a SBM operation fails.private
Arch-specific private data.
input
Input data. Copied to a temporary buffer before starting sandbox mode.
output
Output data. Copied from a temporary buffer after return from sandbox mode.
io
Input and output data. Copied to a temporary buffer before starting sandbox mode and copied back after return.
Parameters
struct sbm *sbm
SBM instance.
Description
Initialize a SBM instance structure.
Return
Zero on success, negative on error.
Parameters
struct sbm *sbm
SBM instance to be cleaned up.
Parameters
const struct sbm *sbm
SBM instance.
Description
Get the SBM error code. This can be used to distinguish between errors returned by the target function and errors from setting up the sandbox environment.
Parameters
struct sbm *sbm
SBM instance.
sbm_func func
Function to be called.
void *data
Argument for func.
Description
Execute func in a fully prepared SBM instance.
Return
Return value of func on success, or a negative error code.
-
SBM_COPY_IN¶
SBM_COPY_IN (sbm, buf, size)
Mark an input buffer for copying into SBM.
Parameters
sbm
SBM instance.
buf
Buffer virtual address.
size
Size of the buffer.
Description
Add a buffer to the input buffer list for sbm. The content of the buffer is copied to sandbox mode before calling the target function.
It is OK to modify the input buffer after invoking this macro.
Return
Buffer address in sandbox mode.
-
SBM_COPY_OUT¶
SBM_COPY_OUT (sbm, buf, size)
Mark an output buffer for copying out of SBM.
Parameters
sbm
SBM instance.
buf
Buffer virtual address.
size
Size of the buffer.
Description
Add a buffer to the output buffer list for sbm. The content of the buffer is copied to kernel mode after calling the target function.
Return
Buffer address in sandbox mode.
-
SBM_COPY_INOUT¶
SBM_COPY_INOUT (sbm, buf, size)
Mark an input buffer for copying into SBM and out of SBM.
Parameters
sbm
SBM instance.
buf
Buffer virtual address.
size
Size of the buffer.
Description
Add a buffer to the input and output buffer list for sbm. The content of the buffer is copied to sandbox mode before calling the target function and copied back to kernel mode after the call.
Return
Buffer address in sandbox mode.
-
SBM_DEFINE_CALL¶
SBM_DEFINE_CALL (f, ...)
Define a call helper.
Parameters
f
Target function name.
...
Parameters as type-identifier pairs.
Description
Declare an argument-passing struct and define the corresponding call
helper. The call helper stores its arguments in an automatic variable of
the corresponding type and calls sbm_exec()
.
The call helper is an inline function, so it is OK to use this macro in header files.
Target function parameters are specified as type-identifier pairs, see __SBM_DECLARE_FUNC().
-
SBM_DEFINE_THUNK¶
SBM_DEFINE_THUNK (f, ...)
Define a thunk function.
Parameters
f
Target function name.
...
Parameters as type-identifier pairs.
Description
The thunk function casts its parameter back to the argument-passing struct and calls the target function f with parameters stored there by the call helper.
Target function parameters are specified as type-identifier pairs, see __SBM_DECLARE_FUNC().
-
SBM_DEFINE_FUNC¶
SBM_DEFINE_FUNC (f, ...)
Define target function, thunk and call helper.
Parameters
f
Target function name.
...
Parameters as type-identifier pairs.
Description
Declare or define a target function and also the corresponding thunk and call helper. Use this shorthand to avoid repeating the target function signature.
The target function is declared twice. The first declaration allows to precede the macro with storage-class specifiers. The second declaration allows to follow the macro with the function body. You can also put a semicolon after the macro to make it only a declaration.
Target function parameters are specified as type-identifier pairs, see __SBM_DECLARE_FUNC().
-
sbm_call¶
sbm_call (sbm, func, ...)
Call a function in sandbox mode.
Parameters
sbm
SBM instance.
func
Function to be called.
...
Target function arguments.
Description
Call a function using a call helper which was previously defined with
SBM_DEFINE_FUNC()
.
Arch Hooks¶
These hooks must be implemented to select HAVE_ARCH_SBM.
Parameters
struct sbm *sbm
Instance to be initialized.
Description
Perform any arch-specific initialization. This hook is called by sbm_init()
immediately after zeroing out sbm.
Return
Zero on success, negative error code on failure.
Parameters
struct sbm *sbm
Instance to be cleaned up.
Description
Perform any arch-specific cleanup. This hook is called by sbm_destroy()
as
the very last operation on sbm.
-
int arch_sbm_map_readonly(struct sbm *sbm, const struct sbm_buf *buf)¶
Arch hook to map a buffer for reading.
Parameters
struct sbm *sbm
SBM instance.
const struct sbm_buf *buf
Buffer to be mapped.
Description
Make the specified buffer readable by sandbox code. See also
arch_sbm_map_writable()
.
Return
Zero on success, negative on error.
-
int arch_sbm_map_writable(struct sbm *sbm, const struct sbm_buf *buf)¶
Arch hook to map a buffer for reading and writing.
Parameters
struct sbm *sbm
SBM instance.
const struct sbm_buf *buf
Buffer to be mapped.
Description
Make the specified buffer readable and writable by sandbox code.
See also arch_sbm_map_readonly()
.
Return
Zero on success, negative on error.
-
int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *data)¶
Arch hook to execute code in a sandbox.
Parameters
struct sbm *sbm
SBM instance.
sbm_func func
Function to be executed in a sandbox.
void *data
Argument passed to func.
Description
Execute func in a fully prepared SBM instance. If sandbox mode
cannot be set up or is aborted, set sbm->error
to a negative error
value. This error is then returned by sbm_exec()
, overriding the
return value of arch_sbm_exec()
.
Return
Return value of func.
X86_64 Implementation¶
The x86_64 implementation provides strong isolation and recovery from CPU exceptions.
Sandbox mode runs in protection ring 3 (same as user mode). This means that:
sandbox code cannot execute privileged CPU instructions,
memory accesses are treated as user accesses.
The thread stack is readable in sandbox mode, because an on-stack data structure is used by call helpers and thunks to pass target function arguments. However, it is not writable, and sandbox code runs on its own stack. The thread stack is not used by interrupt handlers either. Non-IST interrupt handlers run on a separate sandbox exception stack.
The interrupt entry path modifies the saved pt_regs to make it appear as coming from kernel mode. The CR3 register is then switched to kernel mode. The interrupt exit path is modified to restore actual pt_regs and switch the CR3 register back to its sandbox mode value, overriding CR3 changes for page table isolation.
Support for paravirtualized kernels is not (yet) provided.
Current Limitations¶
This section lists know limitations of the current SBM implementation, which are planned to be removed in the future.
Stack¶
There is no generic kernel API to run a function on an alternate stack, so SBM runs on the normal kernel stack by default. The kernel already offers self-protection against stack overflows and underflows as well as against overwriting on-stack data outside the current frame, but violations are usually fatal.
This limitation can be solved for specific targets. Arch hooks can set up a separate stack and recover from stack frame overruns.
Inherent Limitations¶
This section lists limitations which are inherent to the concept.
Explicit Code¶
The main idea behind SandBox Mode is decomposition of one big program (the Linux kernel) into multiple smaller programs that can be sandboxed. AFAIK there is no way to automate this task for an existing code base in C.
Given the performance impact of running code in a sandbox, this limitation may be perceived as a benefit. It is expected that sandbox mode is introduced only knowingly and only where safety is more important than performance.
Complex Data¶
Although data structures are not serialized and deserialized between kernel mode and sandbox mode, all directly and indirectly referenced data structures must be explicitly mapped into the sandbox, which requires some manual effort.
Copying of input/output buffers also incurs some runtime overhead. This overhead can be reduced by sharing data directly with the sandbox, but the resulting isolation is weaker, so it may or may not be acceptable, depending on the overall safety requirements.
Page Granularity¶
Since paging is used to enforce memory safety, page size is the smallest unit. Objects mapped into the sandbox must be aligned to a page boundary, and buffer overflows may not be detected if they fit into the same page.
On the other hand, even though such writes are not detected, they do not corrupt kernel data, because only the output buffer is copied back to kernel mode, and the (corrupted) rest of the page is ignored.
Transitions¶
Transitions between kernel mode and sandbox mode are synchronous. That is, whenever entering or leaving sandbox mode, the currently running CPU executes the instructions necessary to save/restore its kernel-mode state. The API is generic enough to allow asynchronous transitions, e.g. to pass data to another CPU which is already running in sandbox mode. However, to see the benefits, a hypothetical implementation would require far-reaching changes in the kernel scheduler. This is (currently) out of scope.