SandBox Mode¶

Introduction¶

The primary goal of SandBox Mode (SBM) is to reduce the impact of potential memory safety bugs in kernel code by decomposing the kernel. The SBM API allows to run each component inside an isolated execution environment. In particular, memory areas used as input and/or output are isolated from the rest of the kernel and surrounded by guard pages. Without arch hooks, this common base provides weak isolation.

On architectures which implement the necessary arch hooks, SandBox Mode leverages hardware paging facilities and CPU privilege levels to enforce the use of only these predefined memory areas. With arch support, SBM can also recover from protection violations. This means that SBM forcibly terminates the sandbox and returns an error code (e.g. -EFAULT) to the caller, so execution can continue. Such implementation provides strong isolation.

A target function in a sandbox communicates with the rest of the kernel through a caller-defined interface, comprising read-only buffers (input), read-write buffers (output) and the return value. The caller can explicitly share other data with the sandbox, but doing so may reduce isolation strength.

Protection of sensitive kernel data is currently out of scope. SandBox Mode is meant to run kernel code which would otherwise have full access to all system resources. SBM allows to impose a scoped access control policy on which resources are available to the sandbox. That said, protection of sensitive data is foreseen as a future goal, and that's why the API is designed to control not only memory writes but also memory reads.

The expected use case for SandBox Mode is parsing data from untrusted sources, especially if the parsing cannot be reasonably done by a user mode helper. Keep in mind that a sandbox doesn't guarantee that the output data is correct. The result may be corrupt (e.g. as a result of an exploited bug) and where applicable, it should be sanitized before further use.

Using SandBox Mode¶

SandBox Mode is an optional feature, enabled with CONFIG_SANDBOX_MODE. However, the SBM API is always defined regardless of the kernel configuration. It will call a function with the best available isolation, which is:

strong isolation if both CONFIG_SANDBOX_MODE and CONFIG_ARCH_HAVE_SBM are set,
weak isolation if CONFIG_SANDBOX_MODE is set, but CONFIG_ARCH_HAVE_SBM is unset,
no isolation if CONFIG_SANDBOX_MODE is unset.

Code which cannot safely run with no isolation should depend on the relevant config option(s).

The API can be used like this:

#include <linux/sbm.h>

/* Function to be executed in a sandbox. */
static SBM_DEFINE_FUNC(my_func, const struct my_input *, in,
                       struct my_output *, out)
{
      /* Read from in, write to out. */
      return 0;
}

int caller(...)
{
      /* Declare a SBM instance. */
      struct sbm sbm;

      /* Initialize SBM instance. */
      sbm_init(&sbm);

      /* Execute my_func() using the SBM instance. */
      err = sbm_call(&sbm, my_func,
                     SBM_COPY_IN(&sbm, input, in_size),
                     SBM_COPY_OUT(&sbm, output, out_size));

      /* Clean up. */
      sbm_destroy(&sbm);

The return type of a sandbox mode function is always int. The return value is zero on success and negative on error. That's because the SBM helpers return an error code (such as -ENOMEM) if the call cannot be performed.

If sbm_call() returns an error, you can use sbm_error() to decide whether the error was returned by the target function or because sandbox mode was aborted (or failed to run entirely).

Public API¶

struct sbm¶: SandBox Mode instance.

Definition:

struct sbm {
#ifdef CONFIG_SANDBOX_MODE;
    int error;
    void *private;
    struct sbm_buf *input;
    struct sbm_buf *output;
    struct sbm_buf *io;
#endif;
};

Members

error: Error code. Initialized to zero by sbm_init() and updated when a SBM operation fails.
private: Arch-specific private data.
input: Input data. Copied to a temporary buffer before starting sandbox mode.
output: Output data. Copied from a temporary buffer after return from sandbox mode.
io: Input and output data. Copied to a temporary buffer before starting sandbox mode and copied back after return.

int sbm_init(struct sbm *sbm)¶: Initialize a SandBox Mode instance.

Parameters

struct sbm *sbm: SBM instance.

Description

Initialize a SBM instance structure.

Return

Zero on success, negative on error.

void sbm_destroy(struct sbm *sbm)¶: Clean up a SandBox Mode instance.

Parameters

struct sbm *sbm: SBM instance to be cleaned up.

int sbm_error(const struct sbm *sbm)¶: Get SBM error status.

Parameters

const struct sbm *sbm: SBM instance.

Description

Get the SBM error code. This can be used to distinguish between errors returned by the target function and errors from setting up the sandbox environment.

int sbm_exec(struct sbm *sbm, sbm_func func, void *data)¶: Execute function in a sandbox.

Parameters

struct sbm *sbm: SBM instance.
sbm_func func: Function to be called.
void *data: Argument for func.

Description

Execute func in a fully prepared SBM instance.

Return

Return value of func on success, or a negative error code.

SBM_COPY_IN¶

SBM_COPY_IN (sbm, buf, size)

Mark an input buffer for copying into SBM.

Parameters

sbm: SBM instance.
buf: Buffer virtual address.
size: Size of the buffer.

Description

Add a buffer to the input buffer list for sbm. The content of the buffer is copied to sandbox mode before calling the target function.

It is OK to modify the input buffer after invoking this macro.

Return

Buffer address in sandbox mode.

SBM_COPY_OUT¶

SBM_COPY_OUT (sbm, buf, size)

Mark an output buffer for copying out of SBM.

Parameters

sbm: SBM instance.
buf: Buffer virtual address.
size: Size of the buffer.

Description

Add a buffer to the output buffer list for sbm. The content of the buffer is copied to kernel mode after calling the target function.

Return

Buffer address in sandbox mode.

SBM_COPY_INOUT¶

SBM_COPY_INOUT (sbm, buf, size)

Mark an input buffer for copying into SBM and out of SBM.

Parameters

sbm: SBM instance.
buf: Buffer virtual address.
size: Size of the buffer.

Description

Add a buffer to the input and output buffer list for sbm. The content of the buffer is copied to sandbox mode before calling the target function and copied back to kernel mode after the call.

Return

Buffer address in sandbox mode.

SBM_DEFINE_CALL¶

SBM_DEFINE_CALL (f, ...)

Define a call helper.

Parameters

f: Target function name.
...: Parameters as type-identifier pairs.

Description

Declare an argument-passing struct and define the corresponding call helper. The call helper stores its arguments in an automatic variable of the corresponding type and calls sbm_exec().

The call helper is an inline function, so it is OK to use this macro in header files.

Target function parameters are specified as type-identifier pairs, see __SBM_DECLARE_FUNC().

SBM_DEFINE_THUNK¶

SBM_DEFINE_THUNK (f, ...)

Define a thunk function.

Parameters

f: Target function name.
...: Parameters as type-identifier pairs.

Description

The thunk function casts its parameter back to the argument-passing struct and calls the target function f with parameters stored there by the call helper.

Target function parameters are specified as type-identifier pairs, see __SBM_DECLARE_FUNC().

SBM_DEFINE_FUNC¶

SBM_DEFINE_FUNC (f, ...)

Define target function, thunk and call helper.

Parameters

f: Target function name.
...: Parameters as type-identifier pairs.

Description

Declare or define a target function and also the corresponding thunk and call helper. Use this shorthand to avoid repeating the target function signature.

The target function is declared twice. The first declaration allows to precede the macro with storage-class specifiers. The second declaration allows to follow the macro with the function body. You can also put a semicolon after the macro to make it only a declaration.

Target function parameters are specified as type-identifier pairs, see __SBM_DECLARE_FUNC().

sbm_call¶

sbm_call (sbm, func, ...)

Call a function in sandbox mode.

Parameters

sbm: SBM instance.
func: Function to be called.
...: Target function arguments.

Description

Call a function using a call helper which was previously defined with SBM_DEFINE_FUNC().

Arch Hooks¶

These hooks must be implemented to select HAVE_ARCH_SBM.

int arch_sbm_init(struct sbm *sbm)¶: Arch hook to initialize a SBM instance.

Parameters

struct sbm *sbm: Instance to be initialized.

Description

Perform any arch-specific initialization. This hook is called by sbm_init() immediately after zeroing out sbm.

Return

Zero on success, negative error code on failure.

void arch_sbm_destroy(struct sbm *sbm)¶: Arch hook to clean up a SBM instance.

Parameters

struct sbm *sbm: Instance to be cleaned up.

Description

Perform any arch-specific cleanup. This hook is called by sbm_destroy() as the very last operation on sbm.

int arch_sbm_map_readonly(struct sbm *sbm, const struct sbm_buf *buf)¶: Arch hook to map a buffer for reading.

Parameters

struct sbm *sbm: SBM instance.
const struct sbm_buf *buf: Buffer to be mapped.

Description

Make the specified buffer readable by sandbox code. See also arch_sbm_map_writable().

Return

Zero on success, negative on error.

int arch_sbm_map_writable(struct sbm *sbm, const struct sbm_buf *buf)¶: Arch hook to map a buffer for reading and writing.

Parameters

struct sbm *sbm: SBM instance.
const struct sbm_buf *buf: Buffer to be mapped.

Description

Make the specified buffer readable and writable by sandbox code. See also arch_sbm_map_readonly().

Return

Zero on success, negative on error.

int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *data)¶: Arch hook to execute code in a sandbox.

Parameters

struct sbm *sbm: SBM instance.
sbm_func func: Function to be executed in a sandbox.
void *data: Argument passed to func.

Description

Execute func in a fully prepared SBM instance. If sandbox mode cannot be set up or is aborted, set sbm->error to a negative error value. This error is then returned by sbm_exec(), overriding the return value of arch_sbm_exec().

Return

Return value of func.

X86_64 Implementation¶

The x86_64 implementation provides strong isolation and recovery from CPU exceptions.

Sandbox mode runs in protection ring 3 (same as user mode). This means that:

sandbox code cannot execute privileged CPU instructions,
memory accesses are treated as user accesses.

The thread stack is readable in sandbox mode, because an on-stack data structure is used by call helpers and thunks to pass target function arguments. However, it is not writable, and sandbox code runs on its own stack. The thread stack is not used by interrupt handlers either. Non-IST interrupt handlers run on a separate sandbox exception stack.

The interrupt entry path modifies the saved pt_regs to make it appear as coming from kernel mode. The CR3 register is then switched to kernel mode. The interrupt exit path is modified to restore actual pt_regs and switch the CR3 register back to its sandbox mode value, overriding CR3 changes for page table isolation.

Support for paravirtualized kernels is not (yet) provided.

Current Limitations¶

This section lists know limitations of the current SBM implementation, which are planned to be removed in the future.

Stack¶

There is no generic kernel API to run a function on an alternate stack, so SBM runs on the normal kernel stack by default. The kernel already offers self-protection against stack overflows and underflows as well as against overwriting on-stack data outside the current frame, but violations are usually fatal.

This limitation can be solved for specific targets. Arch hooks can set up a separate stack and recover from stack frame overruns.

Inherent Limitations¶

This section lists limitations which are inherent to the concept.

Explicit Code¶

The main idea behind SandBox Mode is decomposition of one big program (the Linux kernel) into multiple smaller programs that can be sandboxed. AFAIK there is no way to automate this task for an existing code base in C.

Given the performance impact of running code in a sandbox, this limitation may be perceived as a benefit. It is expected that sandbox mode is introduced only knowingly and only where safety is more important than performance.

Complex Data¶

Although data structures are not serialized and deserialized between kernel mode and sandbox mode, all directly and indirectly referenced data structures must be explicitly mapped into the sandbox, which requires some manual effort.

Copying of input/output buffers also incurs some runtime overhead. This overhead can be reduced by sharing data directly with the sandbox, but the resulting isolation is weaker, so it may or may not be acceptable, depending on the overall safety requirements.

Page Granularity¶

Since paging is used to enforce memory safety, page size is the smallest unit. Objects mapped into the sandbox must be aligned to a page boundary, and buffer overflows may not be detected if they fit into the same page.

On the other hand, even though such writes are not detected, they do not corrupt kernel data, because only the output buffer is copied back to kernel mode, and the (corrupted) rest of the page is ignored.

Transitions¶

Transitions between kernel mode and sandbox mode are synchronous. That is, whenever entering or leaving sandbox mode, the currently running CPU executes the instructions necessary to save/restore its kernel-mode state. The API is generic enough to allow asynchronous transitions, e.g. to pass data to another CPU which is already running in sandbox mode. However, to see the benefits, a hypothetical implementation would require far-reaching changes in the kernel scheduler. This is (currently) out of scope.

The Linux Kernel

Contents

This Page

SandBox Mode¶

Introduction¶

Using SandBox Mode¶

Public API¶

Arch Hooks¶

X86_64 Implementation¶

Current Limitations¶

Stack¶

Inherent Limitations¶

Explicit Code¶

Complex Data¶

Page Granularity¶

Transitions¶