What You’ll Learn
- How to wire CrewAI’s crew output directly into a Declaw sandbox for safe execution
- The two-agent pattern: researcher defines requirements, coder writes executable code
- Why code produced by a multi-agent crew must be sandboxed before running
- Demo mode: simulate the full crew workflow without an API key
Prerequisites
- Declaw running locally or in the cloud (see Deployment)
DECLAW_API_KEY and DECLAW_DOMAIN set in your environment
OPENAI_API_KEY set in your environment (optional — demo mode runs without it)
pip install crewai (required for live mode only)
This example is available in Python. TypeScript support coming soon.
Code Walkthrough
The crew architecture
CrewAI Crew (runs on host)
├── Researcher Agent
│ Produces: analysis specification
└── Coder Agent
Produces: Python script based on the specification
↓
crew.kickoff() result = Python code string
↓
Declaw Sandbox (microVM)
sbx.files.write("/tmp/crew_output.py", code)
sbx.commands.run("python3 /tmp/crew_output.py")
Both agents run on the host (talking to the LLM API). Only the final code output is sent into the sandbox for execution.
Live mode — defining the crew
from crewai import Agent, Crew, Task
from declaw import Sandbox
researcher = Agent(
role="Data Researcher",
goal="Identify data analysis tasks and specify what code should compute",
backstory=(
"You are an experienced data analyst who breaks down analysis "
"requirements into clear, executable Python tasks."
),
verbose=True,
)
coder = Agent(
role="Python Coder",
goal="Write Python code that performs the requested analysis and prints results",
backstory=(
"You are an expert Python developer. You write clean, "
"self-contained scripts that print results to stdout. "
"No markdown, no explanations, just working Python code."
),
verbose=True,
)
research_task = Task(
description=(
"Analyze the following dataset concept: monthly website traffic "
"data for 12 months. Specify exactly what statistical analysis "
"and insights the code should compute."
),
expected_output="A clear specification of computations to perform",
agent=researcher,
)
coding_task = Task(
description=(
"Based on the research specification, write a complete Python "
"script that generates sample data and performs all the specified "
"analyses. Output ONLY Python code, no markdown fences."
),
expected_output="Complete Python script",
agent=coder,
)
crew = Crew(agents=[researcher, coder], tasks=[research_task, coding_task], verbose=True)
result = crew.kickoff()
code = str(result).strip()
Executing the crew output in a sandbox
# Strip markdown fences if the coder agent added them anyway
if code.startswith("```"):
code = "\n".join(code.split("\n")[1:])
if code.endswith("```"):
code = "\n".join(code.split("\n")[:-1])
sbx = Sandbox.create(template="python", timeout=300)
try:
sbx.files.write("/tmp/crew_output.py", code)
run_result = sbx.commands.run("python3 /tmp/crew_output.py", timeout=30)
print(run_result.stdout)
finally:
sbx.kill()
Demo mode — simulated crew workflow
The demo mode simulates the researcher and coder agents with hardcoded outputs, then executes the generated code in a real sandbox:
# Step 1: Simulated researcher output
researcher_output = """
Analysis Specification for Monthly Website Traffic:
1. Generate 12 months of sample traffic data (visits, unique users, bounce rate)
2. Compute monthly averages for each metric
3. Identify best and worst performing months
4. Calculate month-over-month growth rates
"""
# Step 2: Simulated coder output (the actual executable code)
coder_output = textwrap.dedent("""\
import statistics
import random
random.seed(42)
months = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
visits = [int(10000 * (1 + 0.3 * (1 if i in [4,5,6,10,11] else -0.2))
* random.uniform(0.85, 1.15) * (1 + i * 0.02))
for i in range(12)]
print(f"Average visits: {statistics.mean(visits):,.0f}")
print(f"Best month: {months[visits.index(max(visits))]}")
""")
# Step 3: Execute in a real sandbox
sbx = Sandbox.create(template="python", timeout=300)
try:
sbx.files.write("/tmp/crew_output.py", coder_output)
result = sbx.commands.run("python3 /tmp/crew_output.py", timeout=30)
print(result.stdout)
finally:
sbx.kill()
Expected Output (demo mode)
Agent in Sandbox (CrewAI) Example
============================================================
OPENAI_API_KEY not set — running in demo mode.
--- Demo: Simulating CrewAI multi-agent workflow ---
[Researcher Agent] Analyzing requirements...
[Coder Agent] Generating Python code...
[Sandbox] Executing crew output in isolated sandbox...
Sandbox created: sbx-abc123
Sandbox output:
=== Monthly Website Traffic Data ===
Month Visits Unique Bounce%
----------------------------------
Jan 9,856 7,023 42.1%
...
=== Summary Statistics ===
Avg visits: 12,841
Best: Jun (19,023 visits)
Worst: Jan (9,856 visits)
Overall trend: increasing
Sandbox sbx-abc123 killed.
Why Sandbox the Crew Output
CrewAI agents can produce arbitrary Python code. Without sandboxing:
- The generated code runs on your host machine with your credentials and filesystem access
- A compromised or hallucinating agent could produce code that reads secrets, makes outbound calls, or modifies files
- There is no way to audit or restrict what the generated code does
With Declaw, the crew produces code; Declaw executes it in an isolated microVM. The host process remains safe regardless of what the agents generate.