Agent in Sandbox (CrewAI)

What You’ll Learn

How to wire CrewAI’s crew output directly into a Declaw sandbox for safe execution
The two-agent pattern: researcher defines requirements, coder writes executable code
Why code produced by a multi-agent crew must be sandboxed before running
Demo mode: simulate the full crew workflow without an API key

Prerequisites

Declaw running locally or in the cloud (see Deployment)
DECLAW_API_KEY and DECLAW_DOMAIN set in your environment
OPENAI_API_KEY set in your environment (optional — demo mode runs without it)
pip install crewai (required for live mode only)

This example is available in Python. TypeScript support coming soon.

Code Walkthrough

The crew architecture

CrewAI Crew (runs on host)
  ├── Researcher Agent
  │     Produces: analysis specification
  └── Coder Agent
        Produces: Python script based on the specification

                      ↓
             crew.kickoff() result = Python code string

                      ↓
             Declaw Sandbox (sandbox)
               sbx.files.write("/tmp/crew_output.py", code)
               sbx.commands.run("python3 /tmp/crew_output.py")

Both agents run on the host (talking to the LLM API). Only the final code output is sent into the sandbox for execution.

Live mode — defining the crew

from crewai import Agent, Crew, Task
from declaw import Sandbox

researcher = Agent(
    role="Data Researcher",
    goal="Identify data analysis tasks and specify what code should compute",
    backstory=(
        "You are an experienced data analyst who breaks down analysis "
        "requirements into clear, executable Python tasks."
    ),
    verbose=True,
)

coder = Agent(
    role="Python Coder",
    goal="Write Python code that performs the requested analysis and prints results",
    backstory=(
        "You are an expert Python developer. You write clean, "
        "self-contained scripts that print results to stdout. "
        "No markdown, no explanations, just working Python code."
    ),
    verbose=True,
)

research_task = Task(
    description=(
        "Analyze the following dataset concept: monthly website traffic "
        "data for 12 months. Specify exactly what statistical analysis "
        "and insights the code should compute."
    ),
    expected_output="A clear specification of computations to perform",
    agent=researcher,
)

coding_task = Task(
    description=(
        "Based on the research specification, write a complete Python "
        "script that generates sample data and performs all the specified "
        "analyses. Output ONLY Python code, no markdown fences."
    ),
    expected_output="Complete Python script",
    agent=coder,
)

crew = Crew(agents=[researcher, coder], tasks=[research_task, coding_task], verbose=True)
result = crew.kickoff()
code = str(result).strip()

Executing the crew output in a sandbox

# Strip markdown fences if the coder agent added them anyway
if code.startswith("```"):
    code = "\n".join(code.split("\n")[1:])
if code.endswith("```"):
    code = "\n".join(code.split("\n")[:-1])

sbx = Sandbox.create(template="python", timeout=300)
try:
    sbx.files.write("/tmp/crew_output.py", code)
    run_result = sbx.commands.run("python3 /tmp/crew_output.py", timeout=30)
    print(run_result.stdout)
finally:
    sbx.kill()

Demo mode — simulated crew workflow

The demo mode simulates the researcher and coder agents with hardcoded outputs, then executes the generated code in a real sandbox:

# Step 1: Simulated researcher output
researcher_output = """
Analysis Specification for Monthly Website Traffic:
1. Generate 12 months of sample traffic data (visits, unique users, bounce rate)
2. Compute monthly averages for each metric
3. Identify best and worst performing months
4. Calculate month-over-month growth rates
"""

# Step 2: Simulated coder output (the actual executable code)
coder_output = textwrap.dedent("""\
    import statistics
    import random
    random.seed(42)
    months = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
    visits = [int(10000 * (1 + 0.3 * (1 if i in [4,5,6,10,11] else -0.2))
                  * random.uniform(0.85, 1.15) * (1 + i * 0.02))
              for i in range(12)]
    print(f"Average visits: {statistics.mean(visits):,.0f}")
    print(f"Best month: {months[visits.index(max(visits))]}")
""")

# Step 3: Execute in a real sandbox
sbx = Sandbox.create(template="python", timeout=300)
try:
    sbx.files.write("/tmp/crew_output.py", coder_output)
    result = sbx.commands.run("python3 /tmp/crew_output.py", timeout=30)
    print(result.stdout)
finally:
    sbx.kill()

Expected Output (demo mode)

Agent in Sandbox (CrewAI) Example
============================================================
OPENAI_API_KEY not set — running in demo mode.

--- Demo: Simulating CrewAI multi-agent workflow ---

  [Researcher Agent] Analyzing requirements...
  [Coder Agent] Generating Python code...
  [Sandbox] Executing crew output in isolated sandbox...
  Sandbox created: sbx-abc123

  Sandbox output:
    === Monthly Website Traffic Data ===
    Month   Visits   Unique  Bounce%
    ----------------------------------
    Jan      9,856    7,023    42.1%
    ...

  === Summary Statistics ===
    Avg visits:       12,841
    Best:  Jun (19,023 visits)
    Worst: Jan (9,856 visits)

  Overall trend: increasing

  Sandbox sbx-abc123 killed.

Why Sandbox the Crew Output

CrewAI agents can produce arbitrary Python code. Without sandboxing:

The generated code runs on your host machine with your credentials and filesystem access
A compromised or hallucinating agent could produce code that reads secrets, makes outbound calls, or modifies files
There is no way to audit or restrict what the generated code does

With Declaw, the crew produces code; Declaw executes it in an isolated sandbox. The host process remains safe regardless of what the agents generate.

​What You’ll Learn

​Prerequisites

​Code Walkthrough

​The crew architecture

​Live mode — defining the crew

​Executing the crew output in a sandbox

​Demo mode — simulated crew workflow

​Expected Output (demo mode)

​Why Sandbox the Crew Output

What You’ll Learn

Prerequisites

Code Walkthrough

The crew architecture

Live mode — defining the crew

Executing the crew output in a sandbox

Demo mode — simulated crew workflow

Expected Output (demo mode)

Why Sandbox the Crew Output