The Guardrails Service is an optional Python microservice that provides ML-powered security scanning. When deployed alongside Declaw, it replaces the built-in regex scanners with production-grade models:
- PII detection: Microsoft Presidio with Named Entity Recognition (NER) for unstructured PII (person names, organizations, addresses, passport numbers, etc.)
- Prompt injection detection:
qualifire/prompt-injection-sentinel HuggingFace model that scores content for injection likelihood
If the Guardrails Service is unreachable, the security proxy automatically falls back to the built-in regex scanners. No configuration change is required for this fallback.
Architecture
The security proxy sends scan requests to the Guardrails Service HTTP API at GUARDRAILS_URL/analyze. The service runs scanners in parallel and returns results within the 10-second proxy timeout.
Deploy on GCP
cd guardrails-service/iac/gcp
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars: set project_id, region
../scripts/deploy-gcp.sh
This provisions a GCP VM with the Guardrails Service installed and started via systemd. The service listens on port 8000.
Connect to Declaw
Set the GUARDRAILS_URL environment variable before running the Declaw deploy script, and it will be detected automatically:
export GUARDRAILS_URL=http://<guardrails-vm-ip>:8000
./scripts/deploy-gcp.sh
Or set it manually on the Declaw VM after deployment:
# On the Declaw VM
echo "GUARDRAILS_URL=http://<guardrails-vm-ip>:8000" >> /etc/declaw/env
systemctl restart declaw-orchestrator
Manual connection
In the SDK, set GUARDRAILS_URL in the environment before creating sandboxes:
export GUARDRAILS_URL=http://<guardrails-vm-ip>:8000
export DECLAW_API_KEY=your-key
export DECLAW_DOMAIN=your-domain:8080
The orchestrator reads GUARDRAILS_URL at startup and passes it to each sandbox’s security proxy.
Guardrails Service API
The service exposes a single endpoint:
POST /analyze
Content-Type: application/json
{
"text": "User input to scan...",
"scanners": [
{ "scanner_type": "pii" },
{ "scanner_type": "injection" }
]
}
Response:
{
"scanner_responses": [
{
"scanner_type": "pii",
"entities": [
{ "type": "PERSON", "start": 0, "end": 10, "score": 0.95 },
{ "type": "EMAIL_ADDRESS", "start": 12, "end": 30, "score": 0.99 }
]
},
{
"scanner_type": "injection",
"score": 0.03,
"is_injection": false
}
]
}
Supported scanners
| Scanner type | Model/Library | Detects |
|---|
pii | Microsoft Presidio | 50+ entity types including PERSON, ORG, LOCATION, PHONE_NUMBER, EMAIL, SSN, CREDIT_CARD, PASSPORT_NUMBER, US_DRIVER_LICENSE, IBAN_CODE, MEDICAL_LICENSE |
injection | qualifire/prompt-injection-sentinel (HuggingFace) | Prompt injection likelihood score 0.0–1.0 |
invisible_text | Custom scanner | Unicode invisible character injection |
Invisible text scanner
The Guardrails Service includes a scanner that detects invisible Unicode characters used in prompt injection attacks:
\u200b Zero-width space
\u200c Zero-width non-joiner
\u200d Zero-width joiner
\u2060 Word joiner
\ufeff Byte order mark
\u00ad Soft hyphen
These characters can be embedded in text that appears clean to the human eye but contains hidden instructions to the LLM.
Model loading and caching
Models are downloaded once at service startup and cached on disk. The orchestrator is designed to load models in a non-blocking background thread so the service accepts requests before all models are ready, with a degraded mode that skips unavailable scanners.
Model files are stored at /opt/guardrails/models/ on the service VM.
Local development
Run the Guardrails Service locally with Docker:
cd guardrails-service
docker build -t guardrails-service .
docker run -p 8000:8000 guardrails-service
Or with the provided Docker Compose configuration:
cd guardrails-service
docker compose up -d
Then point Declaw at it:
export GUARDRAILS_URL=http://localhost:8000
Fallback behavior
If GUARDRAILS_URL is set but the service is unreachable:
- The security proxy logs a warning
- PII detection falls back to the built-in regex scanner (SSN, credit card, email, phone patterns)
- Injection detection falls back to the built-in pattern library
- No error is surfaced to the agent workload
The fallback is automatic and requires no code changes. The proxy checks liveness on each scan request with a 10-second timeout.
The built-in regex fallback does not support unstructured PII types like person_name. If your security policy depends on NER-based detection, monitor the Guardrails Service availability and alert on fallback events in the audit log.