CI/CD Pipeline in a Sandbox

What You’ll Learn

How to upload a multi-file Python project into a sandbox
How to run unittest inside the sandbox and capture output
How to parse test results (pass/fail, test count, failure count) from stdout
How to demonstrate a regression by uploading a buggy version and re-running tests
The sandbox-as-CI-runner pattern for safe, isolated test execution

Prerequisites

Declaw running locally or in the cloud (see Deployment)
DECLAW_API_KEY and DECLAW_DOMAIN set in your environment

This example is available in Python. TypeScript support coming soon.

Code Walkthrough

1. Define the project files

Both the module under test and the test file are Python strings defined in the outer script and uploaded to the sandbox:

CALCULATOR_MODULE = """\
class Calculator:
    def add(self, a: float, b: float) -> float:
        return a + b

    def subtract(self, a: float, b: float) -> float:
        return a - b

    def multiply(self, a: float, b: float) -> float:
        return a * b

    def divide(self, a: float, b: float) -> float:
        if b == 0:
            raise ValueError("Cannot divide by zero")
        return a / b
"""

TEST_CALCULATOR = """\
import unittest
from calculator import Calculator

class TestCalculator(unittest.TestCase):
    def setUp(self):
        self.calc = Calculator()

    def test_add(self):
        self.assertEqual(self.calc.add(2, 3), 5)

    def test_divide_by_zero(self):
        with self.assertRaises(ValueError):
            self.calc.divide(1, 0)
"""

2. Upload files and run the test suite

from declaw import Sandbox

sbx = Sandbox.create(template="python", timeout=300)
try:
    sbx.files.write("/home/user/project/calculator.py", CALCULATOR_MODULE)
    sbx.files.write("/home/user/project/test_calculator.py", TEST_CALCULATOR)

    result = sbx.commands.run(
        "cd /home/user/project && python3 -m unittest test_calculator -v 2>&1"
    )
    print(result.stdout)
    print(f"Exit code: {result.exit_code}")
finally:
    sbx.kill()

The 2>&1 redirect sends stderr to stdout so test output is captured in result.stdout. unittest writes its summary to stderr by default.

3. Parse test results

def parse_test_output(stdout: str) -> dict:
    """Parse unittest output and return a summary dict."""
    lines = stdout.strip().splitlines()
    summary = {"passed": False, "total_tests": 0, "failures": 0, "errors": 0}

    for line in lines:
        if line.startswith("Ran "):
            summary["total_tests"] = int(line.split()[1])
        if "OK" in line and "FAILED" not in line:
            summary["passed"] = True
        if line.startswith("FAILED"):
            summary["passed"] = False
            if "failures=" in line:
                summary["failures"] = int(
                    line.split("failures=")[1].split(")")[0].split(",")[0]
                )
    return summary

4. Inject a bug and re-run

The example demonstrates a failing build by uploading a buggy version of the calculator:

buggy_module = CALCULATOR_MODULE.replace(
    "return a + b",
    "return a - b  # BUG: subtraction instead of addition"
)
sbx.files.write("/home/user/project/calculator.py", buggy_module)

result2 = sbx.commands.run(
    "cd /home/user/project && python3 -m unittest test_calculator -v 2>&1"
)
report2 = parse_test_output(result2.stdout)
print(f"Status: {'PASS' if report2['passed'] else 'FAIL'}")
print(f"Failures: {report2['failures']}")

Expected Output

--- Creating Sandbox ---
Sandbox created: sbx-abc123

--- Uploading Project Files ---
  Uploaded: calculator.py
  Uploaded: test_calculator.py

--- Running Test Suite ---

Test output:
test_add ... ok
test_divide ... ok
test_divide_by_zero ... ok
test_multiply ... ok
test_subtract ... ok

Ran 5 tests in 0.001s

OK
Exit code: 0

--- Test Report ---
  Status:      PASS
  Total tests: 5
  Failures:    0
  Errors:      0

--- Injecting a Bug and Re-running ---
  Uploaded buggy calculator.py

Test output:
test_add ... FAIL
...
FAILED (failures=2)
Exit code: 1

--- Buggy Test Report ---
  Status:      FAIL
  Total tests: 5
  Failures:    2
  Errors:      0

Why Use Declaw for CI

Running tests directly on a CI runner (GitHub Actions, CircleCI, Jenkins) means:

Untrusted test code can access the runner’s environment variables, credentials, and filesystem
A compromised dependency in the test suite can exfiltrate CI secrets
A runaway test process can consume all runner resources and block other jobs

With Declaw:

Each test run gets its own isolated sandbox with no host access
Add allow_internet_access=False to prevent network access during tests
Add SecurityPolicy with PII scanning to prevent credential exfiltration even if a test makes outbound calls
Sandboxes are destroyed after each run — no state leaks between runs

​What You’ll Learn

​Prerequisites

​Code Walkthrough

​1. Define the project files

​2. Upload files and run the test suite

​3. Parse test results

​4. Inject a bug and re-run

​Expected Output

​Why Use Declaw for CI

What You’ll Learn

Prerequisites

Code Walkthrough

1. Define the project files

2. Upload files and run the test suite

3. Parse test results

4. Inject a bug and re-run

Expected Output

Why Use Declaw for CI