Skip to main content

What You’ll Learn

  • How to wire CrewAI’s crew output directly into a Declaw sandbox for safe execution
  • The two-agent pattern: researcher defines requirements, coder writes executable code
  • Why code produced by a multi-agent crew must be sandboxed before running
  • Demo mode: simulate the full crew workflow without an API key

Prerequisites

  • Declaw running locally or in the cloud (see Deployment)
  • DECLAW_API_KEY and DECLAW_DOMAIN set in your environment
  • OPENAI_API_KEY set in your environment (optional — demo mode runs without it)
  • pip install crewai (required for live mode only)
This example is available in Python. TypeScript support coming soon.

Code Walkthrough

The crew architecture

CrewAI Crew (runs on host)
  ├── Researcher Agent
  │     Produces: analysis specification
  └── Coder Agent
        Produces: Python script based on the specification


             crew.kickoff() result = Python code string


             Declaw Sandbox (microVM)
               sbx.files.write("/tmp/crew_output.py", code)
               sbx.commands.run("python3 /tmp/crew_output.py")
Both agents run on the host (talking to the LLM API). Only the final code output is sent into the sandbox for execution.

Live mode — defining the crew

from crewai import Agent, Crew, Task
from declaw import Sandbox

researcher = Agent(
    role="Data Researcher",
    goal="Identify data analysis tasks and specify what code should compute",
    backstory=(
        "You are an experienced data analyst who breaks down analysis "
        "requirements into clear, executable Python tasks."
    ),
    verbose=True,
)

coder = Agent(
    role="Python Coder",
    goal="Write Python code that performs the requested analysis and prints results",
    backstory=(
        "You are an expert Python developer. You write clean, "
        "self-contained scripts that print results to stdout. "
        "No markdown, no explanations, just working Python code."
    ),
    verbose=True,
)

research_task = Task(
    description=(
        "Analyze the following dataset concept: monthly website traffic "
        "data for 12 months. Specify exactly what statistical analysis "
        "and insights the code should compute."
    ),
    expected_output="A clear specification of computations to perform",
    agent=researcher,
)

coding_task = Task(
    description=(
        "Based on the research specification, write a complete Python "
        "script that generates sample data and performs all the specified "
        "analyses. Output ONLY Python code, no markdown fences."
    ),
    expected_output="Complete Python script",
    agent=coder,
)

crew = Crew(agents=[researcher, coder], tasks=[research_task, coding_task], verbose=True)
result = crew.kickoff()
code = str(result).strip()

Executing the crew output in a sandbox

# Strip markdown fences if the coder agent added them anyway
if code.startswith("```"):
    code = "\n".join(code.split("\n")[1:])
if code.endswith("```"):
    code = "\n".join(code.split("\n")[:-1])

sbx = Sandbox.create(template="python", timeout=300)
try:
    sbx.files.write("/tmp/crew_output.py", code)
    run_result = sbx.commands.run("python3 /tmp/crew_output.py", timeout=30)
    print(run_result.stdout)
finally:
    sbx.kill()

Demo mode — simulated crew workflow

The demo mode simulates the researcher and coder agents with hardcoded outputs, then executes the generated code in a real sandbox:
# Step 1: Simulated researcher output
researcher_output = """
Analysis Specification for Monthly Website Traffic:
1. Generate 12 months of sample traffic data (visits, unique users, bounce rate)
2. Compute monthly averages for each metric
3. Identify best and worst performing months
4. Calculate month-over-month growth rates
"""

# Step 2: Simulated coder output (the actual executable code)
coder_output = textwrap.dedent("""\
    import statistics
    import random
    random.seed(42)
    months = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
    visits = [int(10000 * (1 + 0.3 * (1 if i in [4,5,6,10,11] else -0.2))
                  * random.uniform(0.85, 1.15) * (1 + i * 0.02))
              for i in range(12)]
    print(f"Average visits: {statistics.mean(visits):,.0f}")
    print(f"Best month: {months[visits.index(max(visits))]}")
""")

# Step 3: Execute in a real sandbox
sbx = Sandbox.create(template="python", timeout=300)
try:
    sbx.files.write("/tmp/crew_output.py", coder_output)
    result = sbx.commands.run("python3 /tmp/crew_output.py", timeout=30)
    print(result.stdout)
finally:
    sbx.kill()

Expected Output (demo mode)

Agent in Sandbox (CrewAI) Example
============================================================
OPENAI_API_KEY not set — running in demo mode.

--- Demo: Simulating CrewAI multi-agent workflow ---

  [Researcher Agent] Analyzing requirements...
  [Coder Agent] Generating Python code...
  [Sandbox] Executing crew output in isolated sandbox...
  Sandbox created: sbx-abc123

  Sandbox output:
    === Monthly Website Traffic Data ===
    Month   Visits   Unique  Bounce%
    ----------------------------------
    Jan      9,856    7,023    42.1%
    ...

  === Summary Statistics ===
    Avg visits:       12,841
    Best:  Jun (19,023 visits)
    Worst: Jan (9,856 visits)

  Overall trend: increasing

  Sandbox sbx-abc123 killed.

Why Sandbox the Crew Output

CrewAI agents can produce arbitrary Python code. Without sandboxing:
  • The generated code runs on your host machine with your credentials and filesystem access
  • A compromised or hallucinating agent could produce code that reads secrets, makes outbound calls, or modifies files
  • There is no way to audit or restrict what the generated code does
With Declaw, the crew produces code; Declaw executes it in an isolated microVM. The host process remains safe regardless of what the agents generate.