Agent in Sandbox (OpenAI)

What You’ll Learn

The fundamental pattern: LLM on host, code execution in sandbox
How to strip markdown code fences from LLM output before executing
How to create and destroy a fresh sandbox per task for strong isolation
Demo mode: run the full workflow without an OpenAI API key

Prerequisites

Declaw running locally or in the cloud (see Deployment)
DECLAW_API_KEY and DECLAW_DOMAIN set in your environment
OPENAI_API_KEY set in your environment (optional — demo mode runs without it)

This example is available in Python. TypeScript support coming soon.

Code Walkthrough

Architecture

Host process                          Sandbox (sandbox)
───────────────────────────────       ──────────────────────────────────
Call OpenAI chat API          →    (isolated)
Receive generated Python code
sbx.files.write(code)         →    /tmp/generated.py written to VM fs
sbx.commands.run(python3 ...) →    Code executes inside the VM
Read result.stdout            ←    VM returns stdout/stderr/exit_code
sbx.kill()                         VM destroyed

The LLM never runs inside the sandbox. Only the generated code does. This ensures that even if the LLM produces malicious code, it executes in an isolated sandbox with no access to host resources.

Live mode (requires `OPENAI_API_KEY`)

import openai
from declaw import Sandbox

client = openai.OpenAI()

task = "Write a Python script that finds all prime numbers under 100 and prints them."

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": (
                "You are a code generation agent. Given a task, respond "
                "ONLY with Python code that accomplishes the task. "
                "No markdown, no explanation."
            ),
        },
        {"role": "user", "content": task},
    ],
    temperature=0,
)

code = strip_code_fences(response.choices[0].message.content or "")

sbx = Sandbox.create(template="python", timeout=300)
try:
    sbx.files.write("/tmp/generated.py", code)
    result = sbx.commands.run("python3 /tmp/generated.py", timeout=30)
    print(result.stdout)
finally:
    sbx.kill()

Stripping code fences

LLMs often wrap code in markdown fences even when instructed not to. Always strip them before executing:

def strip_code_fences(code: str) -> str:
    """Remove markdown code fences from LLM output."""
    code = code.strip()
    if code.startswith("```"):
        code = "\n".join(code.split("\n")[1:])
    if code.endswith("```"):
        code = "\n".join(code.split("\n")[:-1])
    return code.strip()

Demo mode (no API key required)

The example ships with a demo mode that uses hardcoded “LLM output” so you can verify the sandbox execution path without an API key:

# Simulated LLM output for a data analysis task
simulated_code = textwrap.dedent("""\
    import statistics

    sales_data = [
        {"month": "Jan", "revenue": 12500},
        {"month": "Feb", "revenue": 15300},
        # ...
    ]
    revenues = [d["revenue"] for d in sales_data]
    print(f"Average: {statistics.mean(revenues)}")
    print(f"Median:  {statistics.median(revenues)}")
""")

sbx = Sandbox.create(template="python", timeout=300)
try:
    sbx.files.write("/tmp/generated.py", simulated_code)
    result = sbx.commands.run("python3 /tmp/generated.py", timeout=30)
    print(result.stdout)
finally:
    sbx.kill()

Mode selection

The example auto-detects whether to run live or demo:

api_key = os.environ.get("OPENAI_API_KEY", "")
if not api_key or api_key == "your-openai-api-key":
    demo_mode()
else:
    live_mode()

Expected Output (demo mode)

Agent in Sandbox (OpenAI) Example
============================================================
OPENAI_API_KEY not set — running in demo mode.

--- Demo: Simulating OpenAI agent workflow ---

  Simulated task: Analyze a dataset and compute summary statistics
  Creating sandbox and executing...
  Sandbox created: sbx-abc123

  Output:
    === Sales Data Analysis ===
      total_revenue: 95500
      average_revenue: 15916.67
      median_revenue: 15300
      std_deviation: 3152.32
      min_month: Mar
      max_month: Jun
      growth_pct: 68.0

  Sandbox sbx-abc123 killed.

Security Note

A fresh sandbox is created for each task in this example. This is intentional: it ensures that code from one task cannot read files or environment variables left over from a previous task. For long-running sessions where state should persist across tasks, reuse the same sandbox — but understand that state accumulates.

​What You’ll Learn

​Prerequisites

​Code Walkthrough

​Architecture

​Live mode (requires OPENAI_API_KEY)

​Stripping code fences

​Demo mode (no API key required)

​Mode selection

​Expected Output (demo mode)

​Security Note

What You’ll Learn

Prerequisites

Code Walkthrough

Architecture

Live mode (requires `OPENAI_API_KEY`)

Stripping code fences

Demo mode (no API key required)

Mode selection

Expected Output (demo mode)

Security Note