What You’ll Learn
- How to create a sandbox with
network={"allow_out": [...]} for domain-level scraping control
- How to upload and run a scraping script using Python’s
urllib (stdlib — no pip installs needed)
- How to verify that allowed domains are reachable and blocked domains are not
- The pattern for containing scrapers that should only access specific sources
Prerequisites
- Declaw running locally or in the cloud (see Deployment)
DECLAW_API_KEY and DECLAW_DOMAIN set in your environment
- Outbound network access from your Declaw instance to
httpbin.org
This example is available in Python. TypeScript support coming soon.
Code Walkthrough
1. Create the sandbox with a network allow-list
from declaw import Sandbox
sbx = Sandbox.create(
template="python",
timeout=300,
network={"allow_out": ["httpbin.org"]},
)
Only traffic destined for httpbin.org is allowed. All other outbound connections — including DNS for other domains and direct IP connections — are blocked by the TCP proxy.
2. The scraper script
The scraper uses Python’s built-in urllib — no third-party packages required. The sandbox’s base Ubuntu image already has Python 3 installed:
SCRAPER_SCRIPT = """\
import urllib.request
import json
url = "http://httpbin.org/get"
print(f"Fetching: {url}")
req = urllib.request.Request(url, headers={"User-Agent": "Declaw-Sandbox/1.0"})
with urllib.request.urlopen(req, timeout=10) as resp:
body = resp.read().decode("utf-8")
data = json.loads(body)
print(f"Status: {resp.status}")
print(f"Origin IP: {data.get('origin', 'unknown')}")
print(f"Headers sent: {json.dumps(data.get('headers', {}), indent=2)}")
print("SUCCESS: Allowed domain is reachable")
"""
3. Prove blocked domains are unreachable
Use a TCP socket test rather than an HTTP request — the block applies at the TCP layer, so even raw socket connections to blocked IPs are refused:
BLOCKED_SOCKET_TEST = """\
import socket
target = "93.184.216.34" # example.com IP
port = 80
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(5)
s.connect((target, port))
s.close()
print("CONNECTED")
except Exception as e:
print(f"BLOCKED: {e}")
"""
4. Upload and run all three tests
try:
# Test 1: Scrape the allowed domain
sbx.files.write("/home/user/scraper.py", SCRAPER_SCRIPT)
result = sbx.commands.run("python3 /home/user/scraper.py 2>&1")
print(result.stdout)
# Test 2: TCP socket to allowed domain (httpbin.org)
sbx.files.write("/home/user/allowed_test.py", ALLOWED_SOCKET_TEST)
result2 = sbx.commands.run("python3 /home/user/allowed_test.py 2>&1")
print(result2.stdout)
# Test 3: TCP socket to blocked domain (example.com)
sbx.files.write("/home/user/blocked_test.py", BLOCKED_SOCKET_TEST)
result3 = sbx.commands.run("python3 /home/user/blocked_test.py 2>&1")
print(result3.stdout)
finally:
sbx.kill()
Expected Output
--- Creating Sandbox with Network Policy ---
Policy: allow only httpbin.org outbound
Sandbox created: sbx-abc123
--- Scraping Allowed Domain (httpbin.org) ---
Fetching: http://httpbin.org/get
Status: 200
Origin IP: 203.0.113.42
Headers sent: {
"Host": "httpbin.org",
"User-Agent": "Declaw-Sandbox/1.0"
}
SUCCESS: Allowed domain is reachable
--- TCP Socket Test: Allowed Domain (httpbin.org) ---
Resolved httpbin.org to 54.243.149.112
CONNECTED: Allowed domain is reachable at TCP level
--- TCP Socket Test: Blocked Domain (example.com) ---
Attempting TCP connection to 93.184.216.34:80 (example.com)...
BLOCKED: [Errno 110] Connection timed out
--- Network Policy Summary ---
httpbin.org: ALLOWED (network policy permits this domain)
example.com: BLOCKED (not in the allow list)
Use Cases
Price monitoring: Allow only the target retailer’s domain. The scraper cannot exfiltrate data to other servers or call home.
News aggregation: Allowlist a set of news site domains. Even if the scraped page contains malicious JavaScript or links, the sandbox cannot follow them to unauthorized destinations.
Competitive intelligence: Restrict the scraper to a defined list of competitor domains. Any unexpected outbound connection is blocked automatically.
Domain Allowlist vs IP Allowlist
The network policy uses domain names, not IP addresses. The proxy resolves the domain to an IP at connection time and enforces the rule at the TCP layer. This means:
allow_out: ["httpbin.org"] permits connections to any IP that httpbin.org resolves to
- Direct IP connections (like
93.184.216.34) are blocked unless the IP resolves to an allowlisted domain at the time of the connection
- CDNs and load balancers that share IPs across domains are handled correctly — the proxy checks the SNI (TLS) or Host header (HTTP) rather than just the IP
For CIDR-based rules or more fine-grained control, see Network Policies.