SandBox Mode

Introduction

The primary goal of SandBox Mode (SBM) is to reduce the impact of potential memory safety bugs in kernel code by decomposing the kernel. The SBM API allows to run each component inside an isolated execution environment. In particular, memory areas used as input and/or output are isolated from the rest of the kernel and surrounded by guard pages. Without arch hooks, this common base provides weak isolation.

On architectures which implement the necessary arch hooks, SandBox Mode leverages hardware paging facilities and CPU privilege levels to enforce the use of only these predefined memory areas. With arch support, SBM can also recover from protection violations. This means that SBM forcibly terminates the sandbox and returns an error code (e.g. -EFAULT) to the caller, so execution can continue. Such implementation provides strong isolation.

A target function in a sandbox communicates with the rest of the kernel through a caller-defined interface, comprising read-only buffers (input), read-write buffers (output) and the return value. The caller can explicitly share other data with the sandbox, but doing so may reduce isolation strength.

Protection of sensitive kernel data is currently out of scope. SandBox Mode is meant to run kernel code which would otherwise have full access to all system resources. SBM allows to impose a scoped access control policy on which resources are available to the sandbox. That said, protection of sensitive data is foreseen as a future goal, and that's why the API is designed to control not only memory writes but also memory reads.

The expected use case for SandBox Mode is parsing data from untrusted sources, especially if the parsing cannot be reasonably done by a user mode helper. Keep in mind that a sandbox doesn't guarantee that the output data is correct. The result may be corrupt (e.g. as a result of an exploited bug) and where applicable, it should be sanitized before further use.

Using SandBox Mode

SandBox Mode is an optional feature, enabled with CONFIG_SANDBOX_MODE. However, the SBM API is always defined regardless of the kernel configuration. It will call a function with the best available isolation, which is:

  • strong isolation if both CONFIG_SANDBOX_MODE and CONFIG_ARCH_HAVE_SBM are set,

  • weak isolation if CONFIG_SANDBOX_MODE is set, but CONFIG_ARCH_HAVE_SBM is unset,

  • no isolation if CONFIG_SANDBOX_MODE is unset.

Code which cannot safely run with no isolation should depend on the relevant config option(s).

The API can be used like this:

#include <linux/sbm.h>

/* Function to be executed in a sandbox. */
static SBM_DEFINE_FUNC(my_func, const struct my_input *, in,
                       struct my_output *, out)
{
      /* Read from in, write to out. */
      return 0;
}

int caller(...)
{
      /* Declare a SBM instance. */
      struct sbm sbm;

      /* Initialize SBM instance. */
      sbm_init(&sbm);

      /* Execute my_func() using the SBM instance. */
      err = sbm_call(&sbm, my_func,
                     SBM_COPY_IN(&sbm, input, in_size),
                     SBM_COPY_OUT(&sbm, output, out_size));

      /* Clean up. */
      sbm_destroy(&sbm);

The return type of a sandbox mode function is always int. The return value is zero on success and negative on error. That's because the SBM helpers return an error code (such as -ENOMEM) if the call cannot be performed.

If sbm_call() returns an error, you can use sbm_error() to decide whether the error was returned by the target function or because sandbox mode was aborted (or failed to run entirely).

Public API

struct sbm

SandBox Mode instance.

Definition:

struct sbm {
#ifdef CONFIG_SANDBOX_MODE;
    int error;
    void *private;
    struct sbm_buf *input;
    struct sbm_buf *output;
    struct sbm_buf *io;
#endif;
};

Members

error

Error code. Initialized to zero by sbm_init() and updated when a SBM operation fails.

private

Arch-specific private data.

input

Input data. Copied to a temporary buffer before starting sandbox mode.

output

Output data. Copied from a temporary buffer after return from sandbox mode.

io

Input and output data. Copied to a temporary buffer before starting sandbox mode and copied back after return.

int sbm_init(struct sbm *sbm)

Initialize a SandBox Mode instance.

Parameters

struct sbm *sbm

SBM instance.

Description

Initialize a SBM instance structure.

Return

Zero on success, negative on error.

void sbm_destroy(struct sbm *sbm)

Clean up a SandBox Mode instance.

Parameters

struct sbm *sbm

SBM instance to be cleaned up.

int sbm_error(const struct sbm *sbm)

Get SBM error status.

Parameters

const struct sbm *sbm

SBM instance.

Description

Get the SBM error code. This can be used to distinguish between errors returned by the target function and errors from setting up the sandbox environment.

int sbm_exec(struct sbm *sbm, sbm_func func, void *data)

Execute function in a sandbox.

Parameters

struct sbm *sbm

SBM instance.

sbm_func func

Function to be called.

void *data

Argument for func.

Description

Execute func in a fully prepared SBM instance.

Return

Return value of func on success, or a negative error code.

SBM_COPY_IN

SBM_COPY_IN (sbm, buf, size)

Mark an input buffer for copying into SBM.

Parameters

sbm

SBM instance.

buf

Buffer virtual address.

size

Size of the buffer.

Description

Add a buffer to the input buffer list for sbm. The content of the buffer is copied to sandbox mode before calling the target function.

It is OK to modify the input buffer after invoking this macro.

Return

Buffer address in sandbox mode.

SBM_COPY_OUT

SBM_COPY_OUT (sbm, buf, size)

Mark an output buffer for copying out of SBM.

Parameters

sbm

SBM instance.

buf

Buffer virtual address.

size

Size of the buffer.

Description

Add a buffer to the output buffer list for sbm. The content of the buffer is copied to kernel mode after calling the target function.

Return

Buffer address in sandbox mode.

SBM_COPY_INOUT

SBM_COPY_INOUT (sbm, buf, size)

Mark an input buffer for copying into SBM and out of SBM.

Parameters

sbm

SBM instance.

buf

Buffer virtual address.

size

Size of the buffer.

Description

Add a buffer to the input and output buffer list for sbm. The content of the buffer is copied to sandbox mode before calling the target function and copied back to kernel mode after the call.

Return

Buffer address in sandbox mode.

SBM_DEFINE_CALL

SBM_DEFINE_CALL (f, ...)

Define a call helper.

Parameters

f

Target function name.

...

Parameters as type-identifier pairs.

Description

Declare an argument-passing struct and define the corresponding call helper. The call helper stores its arguments in an automatic variable of the corresponding type and calls sbm_exec().

The call helper is an inline function, so it is OK to use this macro in header files.

Target function parameters are specified as type-identifier pairs, see __SBM_DECLARE_FUNC().

SBM_DEFINE_THUNK

SBM_DEFINE_THUNK (f, ...)

Define a thunk function.

Parameters

f

Target function name.

...

Parameters as type-identifier pairs.

Description

The thunk function casts its parameter back to the argument-passing struct and calls the target function f with parameters stored there by the call helper.

Target function parameters are specified as type-identifier pairs, see __SBM_DECLARE_FUNC().

SBM_DEFINE_FUNC

SBM_DEFINE_FUNC (f, ...)

Define target function, thunk and call helper.

Parameters

f

Target function name.

...

Parameters as type-identifier pairs.

Description

Declare or define a target function and also the corresponding thunk and call helper. Use this shorthand to avoid repeating the target function signature.

The target function is declared twice. The first declaration allows to precede the macro with storage-class specifiers. The second declaration allows to follow the macro with the function body. You can also put a semicolon after the macro to make it only a declaration.

Target function parameters are specified as type-identifier pairs, see __SBM_DECLARE_FUNC().

sbm_call

sbm_call (sbm, func, ...)

Call a function in sandbox mode.

Parameters

sbm

SBM instance.

func

Function to be called.

...

Target function arguments.

Description

Call a function using a call helper which was previously defined with SBM_DEFINE_FUNC().

Arch Hooks

These hooks must be implemented to select HAVE_ARCH_SBM.

int arch_sbm_init(struct sbm *sbm)

Arch hook to initialize a SBM instance.

Parameters

struct sbm *sbm

Instance to be initialized.

Description

Perform any arch-specific initialization. This hook is called by sbm_init() immediately after zeroing out sbm.

Return

Zero on success, negative error code on failure.

void arch_sbm_destroy(struct sbm *sbm)

Arch hook to clean up a SBM instance.

Parameters

struct sbm *sbm

Instance to be cleaned up.

Description

Perform any arch-specific cleanup. This hook is called by sbm_destroy() as the very last operation on sbm.

int arch_sbm_map_readonly(struct sbm *sbm, const struct sbm_buf *buf)

Arch hook to map a buffer for reading.

Parameters

struct sbm *sbm

SBM instance.

const struct sbm_buf *buf

Buffer to be mapped.

Description

Make the specified buffer readable by sandbox code. See also arch_sbm_map_writable().

Return

Zero on success, negative on error.

int arch_sbm_map_writable(struct sbm *sbm, const struct sbm_buf *buf)

Arch hook to map a buffer for reading and writing.

Parameters

struct sbm *sbm

SBM instance.

const struct sbm_buf *buf

Buffer to be mapped.

Description

Make the specified buffer readable and writable by sandbox code. See also arch_sbm_map_readonly().

Return

Zero on success, negative on error.

int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *data)

Arch hook to execute code in a sandbox.

Parameters

struct sbm *sbm

SBM instance.

sbm_func func

Function to be executed in a sandbox.

void *data

Argument passed to func.

Description

Execute func in a fully prepared SBM instance. If sandbox mode cannot be set up or is aborted, set sbm->error to a negative error value. This error is then returned by sbm_exec(), overriding the return value of arch_sbm_exec().

Return

Return value of func.

X86_64 Implementation

The x86_64 implementation provides strong isolation and recovery from CPU exceptions.

Sandbox mode runs in protection ring 3 (same as user mode). This means that:

  • sandbox code cannot execute privileged CPU instructions,

  • memory accesses are treated as user accesses.

The thread stack is readable in sandbox mode, because an on-stack data structure is used by call helpers and thunks to pass target function arguments. However, it is not writable, and sandbox code runs on its own stack. The thread stack is not used by interrupt handlers either. Non-IST interrupt handlers run on a separate sandbox exception stack.

The interrupt entry path modifies the saved pt_regs to make it appear as coming from kernel mode. The CR3 register is then switched to kernel mode. The interrupt exit path is modified to restore actual pt_regs and switch the CR3 register back to its sandbox mode value, overriding CR3 changes for page table isolation.

Support for paravirtualized kernels is not (yet) provided.

Current Limitations

This section lists know limitations of the current SBM implementation, which are planned to be removed in the future.

Stack

There is no generic kernel API to run a function on an alternate stack, so SBM runs on the normal kernel stack by default. The kernel already offers self-protection against stack overflows and underflows as well as against overwriting on-stack data outside the current frame, but violations are usually fatal.

This limitation can be solved for specific targets. Arch hooks can set up a separate stack and recover from stack frame overruns.

Inherent Limitations

This section lists limitations which are inherent to the concept.

Explicit Code

The main idea behind SandBox Mode is decomposition of one big program (the Linux kernel) into multiple smaller programs that can be sandboxed. AFAIK there is no way to automate this task for an existing code base in C.

Given the performance impact of running code in a sandbox, this limitation may be perceived as a benefit. It is expected that sandbox mode is introduced only knowingly and only where safety is more important than performance.

Complex Data

Although data structures are not serialized and deserialized between kernel mode and sandbox mode, all directly and indirectly referenced data structures must be explicitly mapped into the sandbox, which requires some manual effort.

Copying of input/output buffers also incurs some runtime overhead. This overhead can be reduced by sharing data directly with the sandbox, but the resulting isolation is weaker, so it may or may not be acceptable, depending on the overall safety requirements.

Page Granularity

Since paging is used to enforce memory safety, page size is the smallest unit. Objects mapped into the sandbox must be aligned to a page boundary, and buffer overflows may not be detected if they fit into the same page.

On the other hand, even though such writes are not detected, they do not corrupt kernel data, because only the output buffer is copied back to kernel mode, and the (corrupted) rest of the page is ignored.

Transitions

Transitions between kernel mode and sandbox mode are synchronous. That is, whenever entering or leaving sandbox mode, the currently running CPU executes the instructions necessary to save/restore its kernel-mode state. The API is generic enough to allow asynchronous transitions, e.g. to pass data to another CPU which is already running in sandbox mode. However, to see the benefits, a hypothetical implementation would require far-reaching changes in the kernel scheduler. This is (currently) out of scope.