Web scraper agent

Use case

Let an agent scrape a single site and synthesize a summary, with an airtight guarantee that it cannot reach any other host. A compromised page or prompt injection cannot make the agent call home because the edge proxy simply won’t connect it there.

Template

python — Python + pip, everything else installed inside the sandbox at runtime.

Run it

export DECLAW_API_KEY=dcl_...
export DECLAW_DOMAIN=api.declaw.ai
export OPENAI_API_KEY=sk-...
# Optional: scrape a different host (allowlist auto-updates to it)
export TARGET_URL=https://example.com

python cookbook/examples/openai-agents-web-scraper/main.py

Security policy — the star of this recipe

SecurityPolicy(
    pii=PIIConfig(enabled=True, action="redact", rehydrate_response=True),
    network=NetworkPolicy(
        allow_out=[
            "api.openai.com",           # required for the agent loop
            "pypi.org",                 # pip install
            "files.pythonhosted.org",   # pip packages
            target_host,                # the one scrape target
        ],
    ),
)

allow_out is the only way out of the sandbox. Any request to any other host returns a connection failure inside the VM. Try this: change the agent’s instructions to curl evil.example and watch the tool call fail — there’s nothing the agent can do from inside the VM to reach it.

Env isolation

envs={
    "TARGET_URL": target_url,
    "USER_AGENT": "declaw-demo-scraper/1.0",
    "SCRAPER_ID": "scr-042",
}

Passing USER_AGENT via env means the agent sets the scraper UA from a guaranteed-consistent value — even if an attacker tries to inject a different UA through the prompt, the instruction tells the agent to read $USER_AGENT, not to compose one.

What the agent does

printenv TARGET_URL USER_AGENT SCRAPER_ID (logged to /workspace/run.log).
pip install beautifulsoup4 lxml requests.
Fetch $TARGET_URL with $USER_AGENT, extract top 5 items, dump JSON to /workspace/results.json.
Return the JSON.

Filesystem isolation

The pip install cache, the BeautifulSoup dep tree, the raw HTML, and results.json all live in the sandbox’s overlay. Nothing is ever written to your host. The next run of this script gets a fresh VM — no lingering cache or cookies.

Full source

See cookbook/examples/openai-agents-web-scraper/main.py in the repo.

Customer support triage TypeScript API builder

​Use case

​Template

​Run it

​Security policy — the star of this recipe

​Env isolation

​What the agent does

​Filesystem isolation

​Full source