diff options
author | Andy Wingo <wingo@pobox.com> | 2017-04-18 20:39:40 +0200 |
---|---|---|
committer | Andy Wingo <wingo@pobox.com> | 2017-04-18 21:27:45 +0200 |
commit | 7c71be0c7e8c533b221dbd71e29e94ea213787cf (patch) | |
tree | 2ae279ef367e334d566233bb01cf2ee4aea63785 /doc | |
parent | 622abec1d2006af2ae0fc35b1b2c4fa99d43b090 (diff) |
Add sandboxed evaluation facility
* module/ice-9/sandbox.scm: New file.
* module/Makefile.am (SOURCES): Add new file.
* doc/ref/api-evaluation.texi (Sandboxed Evaluation): New section.
* NEWS: Update.
* test-suite/tests/sandbox.test: New file.
* test-suite/Makefile.am: Add new file.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/ref/api-evaluation.texi | 265 |
1 files changed, 265 insertions, 0 deletions
diff --git a/doc/ref/api-evaluation.texi b/doc/ref/api-evaluation.texi index 3a3e9e632..7a4c8c975 100644 --- a/doc/ref/api-evaluation.texi +++ b/doc/ref/api-evaluation.texi @@ -22,6 +22,7 @@ loading, evaluating, and compiling Scheme code at run time. * Delayed Evaluation:: Postponing evaluation until it is needed. * Local Evaluation:: Evaluation in a local lexical environment. * Local Inclusion:: Compile-time inclusion of one file in another. +* Sandboxed Evaluation:: Evaluation with limited capabilities. * REPL Servers:: Serving a REPL over a socket. * Cooperative REPL Servers:: REPL server for single-threaded applications. @end menu @@ -1227,6 +1228,270 @@ the source files for a package (as you should!). It makes it possible to evaluate an installed file from source, instead of relying on the @code{.go} file being up to date. +@node Sandboxed Evaluation +@subsection Sandboxed Evaluation + +Sometimes you would like to evaluate code that comes from an untrusted +party. The safest way to do this is to buy a new computer, evaluate the +code on that computer, then throw the machine away. However if you are +unwilling to take this simple approach, Guile does include a limited +``sandbox'' facility that can allow untrusted code to be evaluated with +some confidence. + +To use the sandboxed evaluator, load its module: + +@example +(use-modules (ice-9 sandbox)) +@end example + +Guile's sandboxing facility starts with the ability to restrict the time +and space used by a piece of code. + +@deffn {Scheme Procedure} call-with-time-limit limit thunk limit-reached +Call @var{thunk}, but cancel it if @var{limit} seconds of wall-clock +time have elapsed. If the computation is cancelled, call +@var{limit-reached} in tail position. @var{thunk} must not disable +interrupts or prevent an abort via a @code{dynamic-wind} unwind handler. +@end deffn + +@deffn {Scheme Procedure} call-with-allocation-limit limit thunk limit-reached +Call @var{thunk}, but cancel it if @var{limit} bytes have been +allocated. If the computation is cancelled, call @var{limit-reached} in +tail position. @var{thunk} must not disable interrupts or prevent an +abort via a @code{dynamic-wind} unwind handler. + +This limit applies to both stack and heap allocation. The computation +will not be aborted before @var{limit} bytes have been allocated, but +for the heap allocation limit, the check may be postponed until the next garbage collection. + +Note that as a current shortcoming, the heap size limit applies to all +threads; concurrent allocation by other unrelated threads counts towards +the allocation limit. +@end deffn + +@deffn {Scheme Procedure} call-with-time-and-allocation-limits time-limit allocation-limit thunk +Invoke @var{thunk} in a dynamic extent in which its execution is limited +to @var{time-limit} seconds of wall-clock time, and its allocation to +@var{allocation-limit} bytes. @var{thunk} must not disable interrupts +or prevent an abort via a @code{dynamic-wind} unwind handler. + +If successful, return all values produced by invoking @var{thunk}. Any +uncaught exception thrown by the thunk will propagate out. If the time +or allocation limit is exceeded, an exception will be thrown to the +@code{limit-exceeded} key. +@end deffn + +The time limit and stack limit are both very precise, but the heap limit +only gets checked asynchronously, after a garbage collection. In +particular, if the heap is already very large, the number of allocated +bytes between garbage collections will be large, and therefore the +precision of the check is reduced. + +Additionally, due to the mechanism used by the allocation limit (the +@code{after-gc-hook}), large single allocations like @code{(make-vector +#e1e7)} are only detected after the allocation completes, even if the +allocation itself causes garbage collection. It's possible therefore +for user code to not only exceed the allocation limit set, but also to +exhaust all available memory, causing out-of-memory conditions at any +allocation site. Failure to allocate memory in Guile itself should be +safe and cause an exception to be thrown, but most systems are not +designed to handle @code{malloc} failures. An allocation failure may +therefore exercise unexpected code paths in your system, so it is a +weakness of the sandbox (and therefore an interesting point of attack). + +The main sandbox interface is @code{eval-in-sandbox}. + +@deffn {Scheme Procedure} eval-in-sandbox exp [#:time-limit 0.1] @ + [#:allocation-limit #e10e6] @ + [#:bindings all-pure-bindings] @ + [#:module (make-sandbox-module bindings)] @ + [#:sever-module? #t] +Evaluate the Scheme expression @var{exp} within an isolated +"sandbox". Limit its execution to @var{time-limit} seconds of +wall-clock time, and limit its allocation to @var{allocation-limit} +bytes. + +The evaluation will occur in @var{module}, which defaults to the result +of calling @code{make-sandbox-module} on @var{bindings}, which itself +defaults to @code{all-pure-bindings}. This is the core of the +sandbox: creating a scope for the expression that is @dfn{safe}. + +A safe sandbox module has two characteristics. Firstly, it will not +allow the expression being evaluated to avoid being cancelled due to +time or allocation limits. This ensures that the expression terminates +in a timely fashion. + +Secondly, a safe sandbox module will prevent the evaluation from +receiving information from previous evaluations, or from affecting +future evaluations. All combinations of binding sets exported by +@code{(ice-9 sandbox)} form safe sandbox modules. + +The @var{bindings} should be given as a list of import sets. One import +set is a list whose car names an interface, like @code{(ice-9 q)}, and +whose cdr is a list of imports. An import is either a bare symbol or a +pair of @code{(@var{out} . @var{in})}, where @var{out} and @var{in} are +both symbols and denote the name under which a binding is exported from +the module, and the name under which to make the binding available, +respectively. Note that @var{bindings} is only used as an input to the +default initializer for the @var{module} argument; if you pass +@code{#:module}, @var{bindings} is unused. If @var{sever-module?} is +true (the default), the module will be unlinked from the global module +tree after the evaluation returns, to allow @var{mod} to be +garbage-collected. + +If successful, return all values produced by @var{exp}. Any uncaught +exception thrown by the expression will propagate out. If the time or +allocation limit is exceeded, an exception will be thrown to the +@code{limit-exceeded} key. +@end deffn + +Constructing a safe sandbox module is tricky in general. Guile defines +an easy way to construct safe modules from predefined sets of bindings. +Before getting to that interface, here are some general notes on safety. + +@enumerate +@item The time and allocation limits rely on the ability to interrupt +and cancel a computation. For this reason, no binding included in a +sandbox module should be able to indefinitely postpone interrupt +handling, nor should a binding be able to prevent an abort. In practice +this second consideration means that @code{dynamic-wind} should not be +included in any binding set. +@item The time and allocation limits apply only to the +@code{eval-in-sandbox} call. If the call returns a procedure which is +later called, no limit is ``automatically'' in place. Users of +@code{eval-in-sandbox} have to be very careful to reimpose limits when +calling procedures that escape from sandboxes. +@item Similarly, the dynamic environment of the @code{eval-in-sandbox} +call is not necessarily in place when any procedure that escapes from +the sandbox is later called. + +This detail prevents us from exposing @code{primitive-eval} to the +sandbox, for two reasons. The first is that it's possible for legacy +code to forge references to any binding, if the +@code{allow-legacy-syntax-objects?} parameter is true. The default for +this parameter is true; @pxref{Syntax Transformer Helpers} for the +details. The parameter is bound to @code{#f} for the duration of the +@code{eval-in-sandbox} call itself, but that will not be in place during +calls to escaped procedures. + +The second reason we don't expose @code{primitive-eval} is that +@code{primitive-eval} implicitly works in the current module, which for +an escaped procedure will probably be different than the module that is +current for the @code{eval-in-sandbox} call itself. + +The common denominator here is that if an interface exposed to the +sandbox relies on dynamic environments, it is easy to mistakenly grant +the sandboxed procedure additional capabilities in the form of bindings +that it should not have access to. For this reason, the default sets of +predefined bindings do not depend on any dynamically scoped value. +@item Mutation may allow a sandboxed evaluation to break some invariant +in users of data supplied to it. A lot of code culturally doesn't +expect mutation, but if you hand mutable data to a sandboxed evaluation +and you also grant mutating capabilities to that evaluation, then the +sandboxed code may indeed mutate that data. The default set of bindings +to the sandbox do not include any mutating primitives. + +Relatedly, @code{set!} may allow a sandbox to mutate a primitive, +invalidating many system-wide invariants. Guile is currently quite +permissive when it comes to imported bindings and mutability. Although +@code{set!} to a module-local or lexically bound variable would be fine, +we don't currently have an easy way to disallow @code{set!} to an +imported binding, so currently no binding set includes @code{set!}. +@item Mutation may allow a sandboxed evaluation to keep state, or +make a communication mechanism with other code. On the one hand this +sounds cool, but on the other hand maybe this is part of your threat +model. Again, the default set of bindings doesn't include mutating +primitives, preventing sandboxed evaluations from keeping state. +@item The sandbox should probably not be able to open a network +connection, or write to a file, or open a file from disk. The default +binding set includes no interaction with the operating system. +@end enumerate + +If you, dear reader, find the above discussion interesting, you will +enjoy Jonathan Rees' dissertation, ``A Security Kernel Based on the +Lambda Calculus''. + +@defvr {Scheme Variable} all-pure-bindings +All ``pure'' bindings that together form a safe subset of those bindings +available by default to Guile user code. +@end defvr + +@defvr {Scheme Variable} all-pure-and-impure-bindings +Like @code{all-pure-bindings}, but additionally including mutating +primitives like @code{vector-set!}. This set is still safe in the sense +mentioned above, with the caveats about mutation. +@end defvr + +The components of these composite sets are as follows: +@defvr {Scheme Variable} alist-bindings +@defvrx {Scheme Variable} array-bindings +@defvrx {Scheme Variable} bit-bindings +@defvrx {Scheme Variable} bitvector-bindings +@defvrx {Scheme Variable} char-bindings +@defvrx {Scheme Variable} char-set-bindings +@defvrx {Scheme Variable} clock-bindings +@defvrx {Scheme Variable} core-bindings +@defvrx {Scheme Variable} error-bindings +@defvrx {Scheme Variable} fluid-bindings +@defvrx {Scheme Variable} hash-bindings +@defvrx {Scheme Variable} iteration-bindings +@defvrx {Scheme Variable} keyword-bindings +@defvrx {Scheme Variable} list-bindings +@defvrx {Scheme Variable} macro-bindings +@defvrx {Scheme Variable} nil-bindings +@defvrx {Scheme Variable} number-bindings +@defvrx {Scheme Variable} pair-bindings +@defvrx {Scheme Variable} predicate-bindings +@defvrx {Scheme Variable} procedure-bindings +@defvrx {Scheme Variable} promise-bindings +@defvrx {Scheme Variable} prompt-bindings +@defvrx {Scheme Variable} regexp-bindings +@defvrx {Scheme Variable} sort-bindings +@defvrx {Scheme Variable} srfi-4-bindings +@defvrx {Scheme Variable} string-bindings +@defvrx {Scheme Variable} symbol-bindings +@defvrx {Scheme Variable} unspecified-bindings +@defvrx {Scheme Variable} variable-bindings +@defvrx {Scheme Variable} vector-bindings +@defvrx {Scheme Variable} version-bindings +The components of @code{all-pure-bindings}. +@end defvr + +@defvr {Scheme Variable} mutating-alist-bindings +@defvrx {Scheme Variable} mutating-array-bindings +@defvrx {Scheme Variable} mutating-bitvector-bindings +@defvrx {Scheme Variable} mutating-fluid-bindings +@defvrx {Scheme Variable} mutating-hash-bindings +@defvrx {Scheme Variable} mutating-list-bindings +@defvrx {Scheme Variable} mutating-pair-bindings +@defvrx {Scheme Variable} mutating-sort-bindings +@defvrx {Scheme Variable} mutating-srfi-4-bindings +@defvrx {Scheme Variable} mutating-string-bindings +@defvrx {Scheme Variable} mutating-variable-bindings +@defvrx {Scheme Variable} mutating-vector-bindings +The additional components of @code{all-pure-and-impure-bindings}. +@end defvr + +Finally, what do you do with a binding set? What is a binding set +anyway? @code{make-sandbox-module} is here for you. + +@deffn {Scheme Procedure} make-sandbox-module bindings +Return a fresh module that only contains @var{bindings}. + +The @var{bindings} should be given as a list of import sets. One import +set is a list whose car names an interface, like @code{(ice-9 q)}, and +whose cdr is a list of imports. An import is either a bare symbol or a +pair of @code{(@var{out} . @var{in})}, where @var{out} and @var{in} are +both symbols and denote the name under which a binding is exported from +the module, and the name under which to make the binding available, +respectively. +@end deffn + +So you see that binding sets are just lists, and +@code{all-pure-and-impure-bindings} is really just the result of +appending all of the component binding sets. + + @node REPL Servers @subsection REPL Servers |