Skip to main content

What You’ll Learn

  • The fundamental pattern: LLM on host, code execution in sandbox
  • How to strip markdown code fences from LLM output before executing
  • How to create and destroy a fresh sandbox per task for strong isolation
  • Demo mode: run the full workflow without an OpenAI API key

Prerequisites

  • Declaw running locally or in the cloud (see Deployment)
  • DECLAW_API_KEY and DECLAW_DOMAIN set in your environment
  • OPENAI_API_KEY set in your environment (optional — demo mode runs without it)
This example is available in Python. TypeScript support coming soon.

Code Walkthrough

Architecture

Host process                          Sandbox (microVM)
───────────────────────────────       ──────────────────────────────────
1. Call OpenAI chat API          →    (isolated)
2. Receive generated Python code
3. sbx.files.write(code)         →    /tmp/generated.py written to VM fs
4. sbx.commands.run(python3 ...) →    Code executes inside the VM
5. Read result.stdout            ←    VM returns stdout/stderr/exit_code
6. sbx.kill()                         VM destroyed
The LLM never runs inside the sandbox. Only the generated code does. This ensures that even if the LLM produces malicious code, it executes in an isolated Firecracker microVM with no access to host resources.

Live mode (requires OPENAI_API_KEY)

import openai
from declaw import Sandbox

client = openai.OpenAI()

task = "Write a Python script that finds all prime numbers under 100 and prints them."

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": (
                "You are a code generation agent. Given a task, respond "
                "ONLY with Python code that accomplishes the task. "
                "No markdown, no explanation."
            ),
        },
        {"role": "user", "content": task},
    ],
    temperature=0,
)

code = strip_code_fences(response.choices[0].message.content or "")

sbx = Sandbox.create(template="python", timeout=300)
try:
    sbx.files.write("/tmp/generated.py", code)
    result = sbx.commands.run("python3 /tmp/generated.py", timeout=30)
    print(result.stdout)
finally:
    sbx.kill()

Stripping code fences

LLMs often wrap code in markdown fences even when instructed not to. Always strip them before executing:
def strip_code_fences(code: str) -> str:
    """Remove markdown code fences from LLM output."""
    code = code.strip()
    if code.startswith("```"):
        code = "\n".join(code.split("\n")[1:])
    if code.endswith("```"):
        code = "\n".join(code.split("\n")[:-1])
    return code.strip()

Demo mode (no API key required)

The example ships with a demo mode that uses hardcoded “LLM output” so you can verify the sandbox execution path without an API key:
# Simulated LLM output for a data analysis task
simulated_code = textwrap.dedent("""\
    import statistics

    sales_data = [
        {"month": "Jan", "revenue": 12500},
        {"month": "Feb", "revenue": 15300},
        # ...
    ]
    revenues = [d["revenue"] for d in sales_data]
    print(f"Average: {statistics.mean(revenues)}")
    print(f"Median:  {statistics.median(revenues)}")
""")

sbx = Sandbox.create(template="python", timeout=300)
try:
    sbx.files.write("/tmp/generated.py", simulated_code)
    result = sbx.commands.run("python3 /tmp/generated.py", timeout=30)
    print(result.stdout)
finally:
    sbx.kill()

Mode selection

The example auto-detects whether to run live or demo:
api_key = os.environ.get("OPENAI_API_KEY", "")
if not api_key or api_key == "your-openai-api-key":
    demo_mode()
else:
    live_mode()

Expected Output (demo mode)

Agent in Sandbox (OpenAI) Example
============================================================
OPENAI_API_KEY not set — running in demo mode.

--- Demo: Simulating OpenAI agent workflow ---

  Simulated task: Analyze a dataset and compute summary statistics
  Creating sandbox and executing...
  Sandbox created: sbx-abc123

  Output:
    === Sales Data Analysis ===
      total_revenue: 95500
      average_revenue: 15916.67
      median_revenue: 15300
      std_deviation: 3152.32
      min_month: Mar
      max_month: Jun
      growth_pct: 68.0

  Sandbox sbx-abc123 killed.

Security Note

A fresh sandbox is created for each task in this example. This is intentional: it ensures that code from one task cannot read files or environment variables left over from a previous task. For long-running sessions where state should persist across tasks, reuse the same sandbox — but understand that state accumulates.