Skip to main content

What You’ll Learn

  • How to upload a multi-file Python project into a sandbox
  • How to run unittest inside the sandbox and capture output
  • How to parse test results (pass/fail, test count, failure count) from stdout
  • How to demonstrate a regression by uploading a buggy version and re-running tests
  • The sandbox-as-CI-runner pattern for safe, isolated test execution

Prerequisites

  • Declaw running locally or in the cloud (see Deployment)
  • DECLAW_API_KEY and DECLAW_DOMAIN set in your environment
This example is available in Python. TypeScript support coming soon.

Code Walkthrough

1. Define the project files

Both the module under test and the test file are Python strings defined in the outer script and uploaded to the sandbox:
CALCULATOR_MODULE = """\
class Calculator:
    def add(self, a: float, b: float) -> float:
        return a + b

    def subtract(self, a: float, b: float) -> float:
        return a - b

    def multiply(self, a: float, b: float) -> float:
        return a * b

    def divide(self, a: float, b: float) -> float:
        if b == 0:
            raise ValueError("Cannot divide by zero")
        return a / b
"""

TEST_CALCULATOR = """\
import unittest
from calculator import Calculator

class TestCalculator(unittest.TestCase):
    def setUp(self):
        self.calc = Calculator()

    def test_add(self):
        self.assertEqual(self.calc.add(2, 3), 5)

    def test_divide_by_zero(self):
        with self.assertRaises(ValueError):
            self.calc.divide(1, 0)
"""

2. Upload files and run the test suite

from declaw import Sandbox

sbx = Sandbox.create(template="python", timeout=300)
try:
    sbx.files.write("/home/user/project/calculator.py", CALCULATOR_MODULE)
    sbx.files.write("/home/user/project/test_calculator.py", TEST_CALCULATOR)

    result = sbx.commands.run(
        "cd /home/user/project && python3 -m unittest test_calculator -v 2>&1"
    )
    print(result.stdout)
    print(f"Exit code: {result.exit_code}")
finally:
    sbx.kill()
The 2>&1 redirect sends stderr to stdout so test output is captured in result.stdout. unittest writes its summary to stderr by default.

3. Parse test results

def parse_test_output(stdout: str) -> dict:
    """Parse unittest output and return a summary dict."""
    lines = stdout.strip().splitlines()
    summary = {"passed": False, "total_tests": 0, "failures": 0, "errors": 0}

    for line in lines:
        if line.startswith("Ran "):
            summary["total_tests"] = int(line.split()[1])
        if "OK" in line and "FAILED" not in line:
            summary["passed"] = True
        if line.startswith("FAILED"):
            summary["passed"] = False
            if "failures=" in line:
                summary["failures"] = int(
                    line.split("failures=")[1].split(")")[0].split(",")[0]
                )
    return summary

4. Inject a bug and re-run

The example demonstrates a failing build by uploading a buggy version of the calculator:
buggy_module = CALCULATOR_MODULE.replace(
    "return a + b",
    "return a - b  # BUG: subtraction instead of addition"
)
sbx.files.write("/home/user/project/calculator.py", buggy_module)

result2 = sbx.commands.run(
    "cd /home/user/project && python3 -m unittest test_calculator -v 2>&1"
)
report2 = parse_test_output(result2.stdout)
print(f"Status: {'PASS' if report2['passed'] else 'FAIL'}")
print(f"Failures: {report2['failures']}")

Expected Output

--- Creating Sandbox ---
Sandbox created: sbx-abc123

--- Uploading Project Files ---
  Uploaded: calculator.py
  Uploaded: test_calculator.py

--- Running Test Suite ---

Test output:
test_add ... ok
test_divide ... ok
test_divide_by_zero ... ok
test_multiply ... ok
test_subtract ... ok

Ran 5 tests in 0.001s

OK
Exit code: 0

--- Test Report ---
  Status:      PASS
  Total tests: 5
  Failures:    0
  Errors:      0

--- Injecting a Bug and Re-running ---
  Uploaded buggy calculator.py

Test output:
test_add ... FAIL
...
FAILED (failures=2)
Exit code: 1

--- Buggy Test Report ---
  Status:      FAIL
  Total tests: 5
  Failures:    2
  Errors:      0

Why Use Declaw for CI

Running tests directly on a CI runner (GitHub Actions, CircleCI, Jenkins) means:
  • Untrusted test code can access the runner’s environment variables, credentials, and filesystem
  • A compromised dependency in the test suite can exfiltrate CI secrets
  • A runaway test process can consume all runner resources and block other jobs
With Declaw:
  • Each test run gets its own isolated Firecracker microVM with no host access
  • Add allow_internet_access=False to prevent network access during tests
  • Add SecurityPolicy with PII scanning to prevent credential exfiltration even if a test makes outbound calls
  • Sandboxes are destroyed after each run — no state leaks between runs