V8 — Tool Sandboxing
Run every agent-invoked tool — especially LLM-generated code — inside an isolated execution environment with hard, explicit limits on filesystem, network, processes, memory, time, and cost, so a reasoning error or a successful prompt injection has nowhere to escape to.
Also Known As: Isolated Execution, Code Execution Isolation, Capability Restriction, Sandboxed Runtime. (Variants distinguished by isolation strength — container, userspace kernel, microVM, hosted — see Variants.)
Classification: Category V — Reliability · an execution-isolation pattern — sits between the agent and the host; the mandatory prerequisite for R13 CodeAct and R14 Program of Thoughts, and the third-condition mitigation in the Lethal Trifecta (V3).
Intent
Execute every tool call — particularly any LLM-generated code — in a constrained, ephemeral environment whose access to filesystem, network, processes, time, memory, and cost is enumerated and enforced from outside the agent, so that no reasoning error, hallucinated command, or successful prompt injection can damage the host, exfiltrate data, or run unbounded.
Motivation
R13 CodeAct and R14 Program of Thoughts achieve their accuracy gains by having the LLM emit executable code — usually Python — and running it. Tool use in general involves passing LLM-generated parameters to external programs. Both surfaces have the same property: the language model, not the developer, decides what the runtime sees. Without isolation, the agent's effective permission set is the host's permission set. A single prompt injection (V6 concern) or a single reasoning error becomes a remote-code-execution channel with the agent's full credentials.
The naive alternatives all fail in characteristic ways. Local execution with output capture (subprocess.run, a bare Python exec, HuggingFace's LocalPythonExecutor) gives the model arbitrary access to the developer's filesystem and network; smolagents' own documentation is blunt that this is not a security sandbox. Output filtering catches the consequences after the damage has been done — by the time a malicious command's stdout is filtered, the file has been deleted, the data has been exfiltrated, the cryptocurrency has been mined. Prompt-level restrictions ("only call these functions; do not touch the filesystem") are probabilistic; one successful injection bypasses them. Trusting the model is the position 88% of failed agent pilots end up reasoning their way into, after concluding sandboxing was "too much infrastructure for a prototype".
V8 is the application of principle of least privilege (Saltzer & Schroeder, 1975) to LLM tool execution. Each block of generated code runs in a fresh, scoped environment whose capabilities are enumerated by the developer, enforced by the kernel (or the userspace kernel, or the hypervisor), and torn down after use. The pattern's claim is not that sandboxing makes the agent safe — it is that sandboxing makes the agent's blast radius equal to what the developer chose to grant, rather than whatever the host happened to allow. R13 without V8 is not "R13 with a known risk" — it is a different pattern entirely, one whose risk profile is "remote code execution channel exposed to whoever can get text into the model's context".
This is why V8 is a hard prerequisite for R13 and R14, not a best-practice add-on (see Appendix A, Critical 5). It is also the canonical third-condition mitigation in the Lethal Trifecta: when the agent must process untrusted content, V8 strips its external-communication and host-access capability so the trifecta cannot complete (see V3).
Why sandboxing is mechanically necessary (mechanism 7). Code and shell commands generated by token sampling are stochastic outputs — the same prompt may produce functionally equivalent but subtly different code on different invocations (mechanism 7). Without sandboxing, a stochastic output with file-system write permissions, network access, or process execution capability executes against production infrastructure. The failure mode is not adversarial (though injection risk is real — see V6) but statistical: the model generates plausible-looking code that has unintended side effects, at a rate determined by the sampling distribution rather than by any explicit check. Deterministic enforcement (a sandbox that restricts what the generated code can reach) is the correct response to a stochastic generator: it substitutes a hard boundary for an unreliable probabilistic instruction.
Variants
The variants differ in isolation strength, startup cost, and operational model. Stronger isolation costs more per invocation; weaker is cheaper but assumes a less hostile model.
- Container isolation (Docker, containerd, Podman). Linux namespaces + cgroups + seccomp; the industry default. Strong against userspace attacks; weaker against kernel-exploit container escapes. Startup ~100–500 ms; per-instance cost low. The 80% solution for most agent code execution.
- Userspace kernel (gVisor /
runsc). Intercepts syscalls in Go and re-implements a Linux-like surface in userspace; the host kernel never sees the workload's syscalls directly. Stronger than plain containers because the kernel attack surface is dramatically reduced; ~10–20% performance overhead. Modal's gVisor backend is the canonical hosted example. - MicroVM (Firecracker, Kata Containers, Cloud Hypervisor). Hardware virtualisation via KVM with a minimal device model; each workload gets its own kernel. Strongest isolation short of full VMs; ~125 ms startup (Firecracker); the AWS Lambda / Fargate substrate. Use when multi-tenant isolation must survive a kernel exploit.
- Hosted sandbox services (E2B, Modal, Daytona, Blaxel). Turnkey sandboxes-as-an-API with Python / Jupyter kernels ready to attach to an agent. Internally combine the above primitives; externally a single SDK call. Trade infrastructure work for vendor dependency and per-invocation cost.
- WebAssembly sandboxes (Wasmtime, Wasmer, Pyodide-in-browser). Capability-based isolation by construction — the runtime cannot do anything the host did not explicitly import. Strong for pure-computation tools; weaker fit for the typical R13 surface that wants filesystem / network / subprocess access.
These are the same pattern — enforce an enumerated capability set from outside the agent — implemented at progressively lower layers of the stack. Production R13 deployments typically settle on containers (most teams), gVisor (high-security tenants), or a hosted service (teams who do not want to operate the sandbox themselves).
Applicability
Use V8 when:
- the agent executes LLM-generated code (R13 CodeAct, R14 Program of Thoughts — mandatory, no exceptions);
- the agent invokes any tool that writes to filesystem, performs network I/O, or spawns processes with LLM-supplied parameters;
- the system is multi-tenant — one user's tool execution must not affect another's environment or data;
- the agent satisfies the Lethal Trifecta (V3) and V8 is being used to remove the external-communication condition from the Quarantined LLM;
- production cost or reliability would be materially affected by a runaway or malicious tool execution.
Do not use V8 alone when:
- the underlying risk is prompt hijack rather than execution side-effects — V8 stops the consequences but not the corruption; pair with V6 Prompt Injection Shield;
- the agent loop itself is unbounded — V8 caps each block but the loop needs V9 Bounded Execution;
- the threat is the Lethal Trifecta in full — V8 removes one condition but the architecture also needs V4 Dual LLM;
- the tools are read-only deterministic API calls with no LLM-generated parameters — sandboxing is over-engineering and the right control is schema validation on I2 Function Call.
Decision Criteria
V8 is right when the tool surface includes anything an attacker (or a confused model) could misuse to read, write, exfiltrate, or exhaust — which is almost any non-trivial agent.
1. Does the agent execute LLM-generated code? This is a gate, not a slider. If yes — R13 CodeAct or R14 Program of Thoughts is in play — V8 is mandatory and the only open question is which variant. R13 without V8 is the anti-pattern. If no, continue to test 2.
2. Enumerate tool capabilities. For every tool, list filesystem paths it touches, network endpoints it reaches, processes it spawns, and external resources it consumes. If any tool's enumerated capability set is "broad" (whole filesystem, arbitrary network, arbitrary subprocesses), V8 is mandatory. If every tool is a narrow, schema-validated API call with no side effects on the host, V8 is over-engineering — use I2 Function Call validation instead.
3. Pick the variant by threat model and operational appetite.
- Single-tenant prototype with semi-trusted users $\to$ container (Docker). 80% solution.
- Multi-tenant production with untrusted code $\to$ gVisor or microVM (Firecracker). Stronger kernel-exploit isolation.
- Team does not want to operate the sandbox infrastructure $\to$ hosted service (E2B, Modal, Daytona). Trade per-invocation cost for zero ops.
- Pure-computation tool, no host I/O needed $\to$ WebAssembly (Wasmtime). Capability-based by construction.
4. Set the resource caps from data, not intuition. Per-block CPU seconds, wall-time, memory, and network policy must be calibrated against measured workloads. Defaults to anchor against: 30 s wall-time, 512 MB memory, deny-by-default network with explicit allow-list, no subprocess spawning unless required. Tighten where measured behaviour permits; never loosen without justification logged.
5. Pair, never substitute. V8 is one layer. The agent loop must still be bounded by V9 Bounded Execution (per-block caps do not bound an infinite loop). Untrusted content must still be sanitised by V6 Prompt Injection Shield (V8 contains the consequence; V6 reduces the probability). The Lethal Trifecta still needs V4 Dual LLM for the architectural split (V8 is the third-condition mitigation, not the whole answer). And every sandbox event — execution, cap trip, error — must be logged via V14 Trajectory Logging so the sandbox becomes auditable, not merely operational.
Quick test — V8 is the right pattern when:
- the agent executes LLM-generated code or tools touching filesystem / network / processes with LLM-supplied parameters, and
- the cost of a misused capability (data exfiltration, host compromise, resource exhaustion) materially exceeds the cost of sandbox setup and per-block latency, and
- the team can enumerate the capability set each tool actually needs (deny-by-default is feasible).
If the agent executes code and V8 cannot be provisioned, the only safe configuration is fall back to R4 ReAct with schema-validated JSON tool calls — R13 without V8 is not deployable. If the tool surface is purely deterministic API calls with no host I/O, V8 is over-engineering — use I2 Function Call validation. If the threat is the model itself being hijacked, V8 alone is insufficient — pair with V6 Prompt Injection Shield and, in the Lethal Trifecta case, V4 Dual LLM.
Structure
Agent (R13 / R4 / tool-using) emits code block or tool call
│
▼
[ Sandbox Manager ]
│
spin up fresh environment:
├── filesystem: ephemeral, scoped paths only
├── network: deny-by-default + allow-list
├── processes: no spawn (or capped)
├── time: hard wall-clock cap
├── memory: hard cap
└── cost: external-call budget
│
▼
Execute block
│
▼
[ Resource Monitor ] ──── cap tripped ──┐
│ │
ok → collect result terminate;
│ return cap-trip
▼ as Observation
[ Result Sanitiser ] │
(validate, scrub PII) │
│ │
▼ │
Observation ◀─────────────────────────┘
│
▼
Destroy environment
│
▼
Back to Agent loop (V9 bounds the loop)
Participants
Each participant owns exactly one boundary or enforcement responsibility; the pattern's security comes from that separation.
| Participant | Owns | Input $\to$ Output | Must not |
|---|---|---|---|
| Sandbox Manager | creation and teardown of isolated environments | (capability set, code/tool call) $\to$ (configured environment, handle) | reuse environments across distinct agent runs — leaking variables, files, or cached credentials across users is the pattern's most-cited operational failure. |
| Capability Set | the explicit, enumerated permission grant for this invocation | tool requirements $\to$ (fs paths, net endpoints, proc rules, caps) | default to permissive — every capability must be granted explicitly; the absence of a grant is denial. "Just allow everything for now" is how every sandbox escape post-mortem begins. |
| Executor | running the code or tool inside the enforced environment | (code block, environment) $\to$ (stdout, return value, traceback, resource usage) | escape isolation; if it can, the pattern has not been implemented. The executor is the V8 implementation — not a subprocess.run shortcut around it. |
| Resource Monitor | enforcing the per-block caps in real time | running execution $\to$ (terminate-on-trip signal, usage record) | wait for graceful shutdown when a cap is tripped — kill the process; report the trip as the Observation. A monitor that hopes the workload will stop on its own is not a monitor. |
| Result Sanitiser | validating and scrubbing tool output before it enters agent context | raw execution output $\to$ cleaned, schema-valid Observation | trust tool output as agent context — sanitise PII, strip injection-shaped strings, enforce schema. Tool output is untrusted content (the V6 concern) even when the tool is trusted. |
| Audit Logger (V14) | recording every execution, cap trip, error, and capability grant | sandbox events $\to$ trace span | omit failed or terminated executions; those are the security-relevant events. The logger feeds V14 Trajectory Logging and V17 Online Eval. |
The defining separations are Capability Set $\leftrightarrow$ Executor (the executor cannot grant itself capabilities; the set is decided outside) and Executor $\leftrightarrow$ Resource Monitor (the executor is observed by something it cannot turn off). When either separation collapses — the executor decides its own permissions, or the monitor is in-process and killable by the workload — V8 is V8 in name only.
Collaborations
A tool invocation request arrives from the agent — a Python code block (R13), a structured tool call (R4), or an arbitrary command. The Sandbox Manager reads the Capability Set for this invocation (fixed at design time for known tools; declared per-call for code execution) and constructs a fresh environment: container or microVM, scoped filesystem mounts, network policy applied, resource cgroups configured. The Executor runs the block inside the environment. In parallel, the Resource Monitor watches CPU, memory, wall-time, and network usage; if any cap is tripped, it terminates the workload and reports the trip. On normal completion, the Result Sanitiser validates output against the expected schema, scrubs sensitive content, and returns the Observation. The Sandbox Manager destroys the environment — kernel, filesystem, network namespace — so nothing from this invocation leaks into the next. Every step is recorded by the Audit Logger as a V14 span.
One level up, V8 composes tightly with other Reliability patterns. V9 Bounded Execution caps the agent loop (max steps, max cost); V8 caps each block within it; both are required. V14 Trajectory Logging captures sandbox events as part of the agent trace, making post-incident review possible. V6 Prompt Injection Shield runs upstream — V8's job is to make a successful injection blastless, but V6's job is to make injection less likely in the first place. In the Lethal Trifecta case (V3), V8 strips external-communication capability from the Quarantined LLM in a V4 Dual LLM architecture, completing the trifecta-prevention design.
Consequences
Benefits
- Reduces the blast radius of a tool execution from "whatever the host allows" to "whatever the developer explicitly granted".
- Makes R13 / R14 — and code execution generally — safe to deploy in production and shared environments.
- Bounds resource consumption per block: no infinite loop or memory bomb in a single emitted block can take down the host.
- Provides a clean audit boundary: every execution is logged with its capability set, resource usage, and outcome.
- Composes with V4, V6, V9, V14 into the full code-execution-agent security posture.
Costs
- Infrastructure: containers / microVMs / hosted services are infrastructure dependencies, not flags.
- Latency per invocation: sandbox spin-up and teardown add 50–500 ms typically (microVMs faster; cold container starts slower).
- Operational complexity: kernel lifetime, network policy, credential isolation, cleanup between users — all must be designed and operated.
- Tool compatibility friction: tools assuming filesystem or network access that the sandbox denies will fail until either the tool or the capability set is revised.
- Per-invocation cost in hosted models (E2B / Modal / Daytona); per-host cost in self-hosted models.
Risks and failure modes
- Permissive defaults. The sandbox is configured "permissively to avoid breaking tools" — and is no longer a sandbox. Deny-by-default is non-negotiable.
- Capability creep. A new tool is added; the developer grants whatever it needs; the granted set accumulates over months until the agent has effectively unconstrained access. Audit capability sets quarterly.
- Sandbox escape. Container escape (kernel exploit) or hypervisor escape (rare). Mitigation: use gVisor or microVMs for high-security tenants; keep host kernel patched.
- Kernel leakage across users. A sandbox kernel reused between agent runs leaks variables, files, credentials. Each agent run gets a fresh environment.
- Monitor in-process. The resource monitor runs inside the workload it monitors and is killed by the workload it monitors. The monitor must be external (kernel-level cgroup, separate process, hypervisor).
- Sandbox-trusted output. Output from the sandbox is treated as trusted because "we ran it ourselves" — but the input to the sandbox came from an untrusted LLM. Always sanitise output (V6).
- Network allow-list as deny-list. "Allow
*.openai.com" sounds restrictive; an attacker exfiltrates via a subdomain of an allowed domain. Tight allow-lists, or no network at all by default.
Implementation Notes
- Deny by default; grant explicitly. Start with no filesystem mounts, no network, no subprocess spawning, no environment variables. Add only what the specific tool or code block requires for this invocation. The capability set is part of the agent's contract, not an afterthought.
- Pick the variant deliberately. Containers (Docker) for the general case; gVisor when kernel attack surface is a real concern; Firecracker / microVMs for hard multi-tenant isolation; hosted services (E2B / Modal / Daytona) when ops cost is the deciding factor; WebAssembly for pure-compute tools.
- Fresh environment per run; persistent kernel only within a run. For R13's persistent-kernel pattern, the kernel persists across blocks within one agent run, but the environment is fresh per run. Cross-user leakage is the single most-cited V8 failure mode.
- Resource caps per block, not just per loop. Per-block CPU seconds, memory, wall-time, network. The agent-loop V9 says "stop after N steps"; the sandbox cap says "stop this block after T seconds / M megabytes". Both are required.
- Network policy is the load-bearing axis. Most sensitive sandbox decisions are network. The default should be
deny; allow-lists should be tight and reviewed; outbound DNS should be considered an egress path; allow-lists by domain are weaker than allow-lists by IP and port. - Treat output as untrusted. Tool / code output is content the LLM will read next. Sanitise it (V6's spotlighting transforms apply here too), validate against a schema, and never inject raw multi-megabyte tool output into agent context.
- Log every execution, every cap trip, every grant. V14 trace spans for sandbox lifecycle events. A V8 deployment with no telemetry is operational, not auditable.
- Test sandbox restoration. Periodically run known-malicious code blocks (a red-team test set) and verify that termination, cleanup, and logging behave as designed. Sandboxes that are never adversarially tested are sandboxes that have never been tested.
- The 30-second default is a default, not a law. Calibrate wall-time, memory, and CPU from measured p99 of legitimate workloads; tighten where data permits. Defaults catch the gross failures; tuned caps catch the long tail.
Implementation Sketch
LLM= configured session (model + setup + per-call prompt);code= wiring. V8 is overwhelmingly code; the LLM sessions named below are upstream callers of the sandbox, not parts of it.
Composition: V8 wraps tool / code execution for any agent loop that emits actions touching the host. It is the mandatory inner pattern for R13 CodeAct and R14 Program of Thoughts; it is the third-condition mitigation in the V3 Lethal Trifecta decision and a constituent of the V4 Dual LLM architecture. Per-block resource caps and capability declarations compose with V9 Bounded Execution at the loop level. Every sandbox event is a span in V14 Trajectory Logging. Output sanitisation reuses V6 Prompt Injection Shield transforms.
The chain:
| # | Step | Kind | Draws on |
|---|---|---|---|
| 1 | Agent emits code block / tool call | LLM | R13 / R4 Agent session |
| 2 | Resolve capability set for this invocation | code | tool registry |
| 3 | Spin up fresh sandbox (container / gVisor / microVM / hosted) | code | V8 backend |
| 4 | Execute block in sandbox; monitor caps externally | code | V8 |
| 5 | On cap trip: terminate workload; build cap-trip Observation | code | V8 monitor |
| 6 | Collect stdout, return value, traceback, resource usage | code | V8 |
| 7 | Sanitise output (schema validate, PII scrub, V6 transforms) | code (or small LLM) | V6 |
| 8 | Tear down sandbox environment | code | V8 |
| 9 | Log invocation, capabilities, usage, outcome to V14 trace | code | V14 |
| 10 | Return Observation to agent loop | code | R13 / R4 |
Skeleton — the wiring; the LLM call is upstream of the sandbox, not inside it:
execute_in_sandbox(action, agent_id, run_id):
caps = capability_set(action.tool_or_code) # code — deny by default
env = SandboxBackend.spawn( # code — V8 variant: Docker / gVisor / Firecracker / E2B
capabilities = caps,
cpu_s = caps.cpu_s, # e.g. 5
mem_mb = caps.mem_mb, # e.g. 512
wall_s = caps.wall_s, # e.g. 30
network = caps.network, # "deny" or allow-list
fs_mounts = caps.fs_mounts, # scoped, ephemeral
proc_spawn = caps.proc_spawn, # usually False
run_id = run_id, # fresh kernel per agent run
)
monitor = ResourceMonitor(env, caps) # code — external to workload
try:
result = env.run(action) # code — Executor
if monitor.cap_tripped():
env.kill()
obs = f"Sandbox cap tripped: {monitor.reason}" # cap trips become Observations
else:
obs = sanitise(result, schema = action.schema) # code (+ V6 transforms)
finally:
env.destroy() # code — no kernel reuse across runs
V14.log_span("sandbox.execute", # code — V14
caps = caps, usage = monitor.usage,
outcome = obs.status)
return obs # code — back to R13/R4 loop
The LLM sessions. V8 itself contains no LLM call; the sessions named here are the callers whose actions V8 isolates. They are sketched so the chain is honest about where the LLM lives relative to the sandbox.
| Session | Model | Setup — loaded once, before first call | Per-call prompt wraps |
|---|---|---|---|
| Agent (caller) | as per parent pattern (R13 frontier generalist; R4 generalist; etc.) | as per parent pattern — V8 imposes no extra setup beyond declaring the tool surface that maps to sandbox capabilities: tools available, their Python signatures, and (where relevant) the capability constraints the agent should respect | the trajectory and the next action prompt |
| Sanitiser (optional) | small fast generalist or a deterministic schema validator | role: "you scrub tool output for PII, injection-shaped strings, and schema conformance"; the schema for this tool's output; the V6 transforms in use | the raw sandbox output for this invocation |
Specialist-model note. V8 is infrastructure, not an LLM pattern — no specialist model is required for the sandbox itself. The build dependencies are the sandbox backend (Docker / gVisor / Firecracker / E2B / Modal / Daytona — pick one before writing the agent), the capability-set declarations (one per tool, plus per-invocation overrides for code execution), and the resource-cap calibration (from measured workloads, not intuition). The optional Sanitiser session can use a small fast generalist when schema-validation is too weak; for most tools, schema validation in code is sufficient.
Open-Source Implementations
- E2B Code Interpreter —
github.com/e2b-dev/code-interpreter— hosted, open-source infrastructure for running AI-generated code in secure isolated sandboxes; Python and JavaScript / TypeScript SDKs; ships with Jupyter kernels; the dominant turnkey V8 backend for R13 implementations. - gVisor —
github.com/google/gvisor— Google's application kernel for containers, written in Go, running in userspace; provides much stronger isolation than plain containers while keeping container ergonomics;runscOCI runtime integrates with Docker and Kubernetes. - Firecracker —
github.com/firecracker-microvm/firecracker— AWS's open-source microVM monitor on KVM; the substrate behind AWS Lambda and Fargate; minimal device model, ~125 ms startup, multi-tenant kernel-exploit isolation. - Moby (upstream Docker Engine) —
github.com/moby/moby— the canonical container runtime; namespaces + cgroups + seccomp; the default V8 backend for most production R13 deployments. - Daytona —
github.com/daytonaio/daytona— secure and elastic infrastructure runtime for AI-generated code execution and agent workflows; ~90 ms sandbox spin-up, dedicated kernel and filesystem per sandbox; SDKs in Python, TypeScript, Ruby, Go, Java. - Modal sandbox examples —
github.com/modal-labs/modal-examples—13_sandboxes/contains runnable examples (LangChain coding agent, Claude managed agents, OpenAI Agents SDK) using Modal's gVisor-backed sandboxes; the canonical pattern for "agent code execution as a hosted service". - Kata Containers —
github.com/kata-containers/kata-containers— OCI-compatible lightweight VMs combining container ergonomics with hardware virtualisation; an alternative to Firecracker for multi-tenant kernel isolation.
Known Uses
- OpenHands (All-Hands AI, formerly OpenDevin) — every CodeActAgent invocation runs inside a Docker sandbox; the largest open-source production deployment of R13 + V8.
- Anthropic Claude code execution tool / OpenAI Code Interpreter — vendor-hosted Python sandboxes that back the code-execution channels in Claude and ChatGPT; vendor-managed V8 implementations.
- HuggingFace smolagents —
CodeAgentships with sandbox backends for E2B, Modal, Blaxel, Docker, and WebAssembly; the docs explicitly warn that the built-inLocalPythonExecutoris not a security sandbox. - AWS Lambda / Fargate — Firecracker microVMs as the substrate; not agent-specific but the canonical proof that microVM isolation works at hyperscale.
- Modal-hosted agent products — Modal's gVisor sandbox is the execution substrate for a generation of coding-agent and research-agent products; "agent emits code $\to$ Modal runs it $\to$ output returns".
- E2B-hosted agents — data-analysis, research, and dataframe-manipulation agents on E2B Code Interpreter, including Jupyter-kernel sessions per agent run.
Related Patterns
- Required by R13 CodeAct — hard prerequisite; R13 without V8 is a remote-code-execution channel exposed to whoever can get text into the model's context. See Appendix A, Critical 5.
- Required by R14 Program of Thoughts — same logic; R14 emits and executes Python for numerical computation.
- Composes with V9 Bounded Execution — V9 caps the agent loop; V8 caps each block within it. Both are required.
- Composes with V14 Trajectory Logging — every sandbox lifecycle event is a span; the trace is part of the security artefact.
- Composes with V6 Prompt Injection Shield — V6 reduces the probability of a hijack; V8 reduces the blast radius when one succeeds. Defence in depth.
- Composes with V4 Dual LLM — V8 strips external-communication capability from the Quarantined LLM, completing the architectural split for Lethal Trifecta (V3) cases.
- Mitigates condition 3 of V3 Rule of Two — the external-communication condition of the Lethal Trifecta; V8 removes the agent's ability to reach outside the sandbox.
- Pairs with V5 Guardrail Layering — sandbox boundaries are guardrail enforcement points; pre-call grants the capability set, post-call sanitises the output.
- Distinct from V9 Bounded Execution — V8 isolates what a tool can do; V9 limits how many times the loop runs. Both are needed; neither substitutes for the other.
- Distinct from I2 Function Call — I2 validates parameters against a schema; V8 isolates execution. A schema-validated tool call can still exhaust resources or touch the wrong filesystem path; V8 is the second layer.
Sources
- Wang, X., Li, B., Song, Y., Xu, F. F., Tang, X., Zhuge, M., Pan, J., et al. (2024). "Executable Code Actions Elicit Better LLM Agents." arXiv 2402.01030. ICML 2024. — the canonical R13 reference; notes sandboxing as the mandatory co-pattern.
- Saltzer, J. H., & Schroeder, M. D. (1975). "The Protection of Information in Computer Systems." Proceedings of the IEEE 63(9). — origin of the principle of least privilege applied to V8's capability sets.
- Agache, A., Brooker, M., Iordache, A., Liguori, A., Neugebauer, R., Piwonka, P., & Popa, D.-M. (2020). "Firecracker: Lightweight Virtualization for Serverless Applications." NSDI 2020. — Firecracker's design paper; the canonical microVM-for-sandbox reference.
- gVisor design documentation (Google) — userspace-kernel architecture and threat model.
- OWASP LLM Top 10 (2025) — LLM02 Insecure Output Handling, LLM05 Improper Output Handling, LLM08 Excessive Agency; sandboxing as a primary mitigation.
- Simon Willison — Lethal Trifecta series (simonwillison.net, 2023–25); V8 as the canonical mitigation for the external-communication condition.
- HuggingFace smolagents documentation — explicit guidance that
LocalPythonExecutoris not a security sandbox; production deployments must select a real V8 backend. - 12-Factor Agents (Dex Horthy, HumanLayer) — Factor 5 (Own Your Context Window) and Factor 11 (Trigger from Anywhere, Trust Nobody); resource and trust bounds applied to tool execution.