Skip to main content
The Guardrails Service is an optional Python microservice that provides ML-powered security scanning. When deployed alongside Declaw, it replaces the built-in regex scanners with production-grade models:
  • PII detection: Microsoft Presidio with Named Entity Recognition (NER) for unstructured PII (person names, organizations, addresses, passport numbers, etc.)
  • Prompt injection detection: qualifire/prompt-injection-sentinel HuggingFace model that scores content for injection likelihood
If the Guardrails Service is unreachable, the security proxy automatically falls back to the built-in regex scanners. No configuration change is required for this fallback.

Architecture

The security proxy sends scan requests to the Guardrails Service HTTP API at GUARDRAILS_URL/analyze. The service runs scanners in parallel and returns results within the 10-second proxy timeout.

Deploy on GCP

cd guardrails-service/iac/gcp
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars: set project_id, region
../scripts/deploy-gcp.sh
This provisions a GCP VM with the Guardrails Service installed and started via systemd. The service listens on port 8000.

Connect to Declaw

Set the GUARDRAILS_URL environment variable before running the Declaw deploy script, and it will be detected automatically:
export GUARDRAILS_URL=http://<guardrails-vm-ip>:8000
./scripts/deploy-gcp.sh
Or set it manually on the Declaw VM after deployment:
# On the Declaw VM
echo "GUARDRAILS_URL=http://<guardrails-vm-ip>:8000" >> /etc/declaw/env
systemctl restart declaw-orchestrator

Manual connection

In the SDK, set GUARDRAILS_URL in the environment before creating sandboxes:
export GUARDRAILS_URL=http://<guardrails-vm-ip>:8000
export DECLAW_API_KEY=your-key
export DECLAW_DOMAIN=your-domain:8080
The orchestrator reads GUARDRAILS_URL at startup and passes it to each sandbox’s security proxy.

Guardrails Service API

The service exposes a single endpoint:
POST /analyze
Content-Type: application/json

{
  "text": "User input to scan...",
  "scanners": [
    { "scanner_type": "pii" },
    { "scanner_type": "injection" }
  ]
}
Response:
{
  "scanner_responses": [
    {
      "scanner_type": "pii",
      "entities": [
        { "type": "PERSON", "start": 0, "end": 10, "score": 0.95 },
        { "type": "EMAIL_ADDRESS", "start": 12, "end": 30, "score": 0.99 }
      ]
    },
    {
      "scanner_type": "injection",
      "score": 0.03,
      "is_injection": false
    }
  ]
}

Supported scanners

Scanner typeModel/LibraryDetects
piiMicrosoft Presidio50+ entity types including PERSON, ORG, LOCATION, PHONE_NUMBER, EMAIL, SSN, CREDIT_CARD, PASSPORT_NUMBER, US_DRIVER_LICENSE, IBAN_CODE, MEDICAL_LICENSE
injectionqualifire/prompt-injection-sentinel (HuggingFace)Prompt injection likelihood score 0.0–1.0
invisible_textCustom scannerUnicode invisible character injection

Invisible text scanner

The Guardrails Service includes a scanner that detects invisible Unicode characters used in prompt injection attacks:
\u200b  Zero-width space
\u200c  Zero-width non-joiner
\u200d  Zero-width joiner
\u2060  Word joiner
\ufeff  Byte order mark
\u00ad  Soft hyphen
These characters can be embedded in text that appears clean to the human eye but contains hidden instructions to the LLM.

Model loading and caching

Models are downloaded once at service startup and cached on disk. The orchestrator is designed to load models in a non-blocking background thread so the service accepts requests before all models are ready, with a degraded mode that skips unavailable scanners. Model files are stored at /opt/guardrails/models/ on the service VM.

Local development

Run the Guardrails Service locally with Docker:
cd guardrails-service
docker build -t guardrails-service .
docker run -p 8000:8000 guardrails-service
Or with the provided Docker Compose configuration:
cd guardrails-service
docker compose up -d
Then point Declaw at it:
export GUARDRAILS_URL=http://localhost:8000

Fallback behavior

If GUARDRAILS_URL is set but the service is unreachable:
  1. The security proxy logs a warning
  2. PII detection falls back to the built-in regex scanner (SSN, credit card, email, phone patterns)
  3. Injection detection falls back to the built-in pattern library
  4. No error is surfaced to the agent workload
The fallback is automatic and requires no code changes. The proxy checks liveness on each scan request with a 10-second timeout.
The built-in regex fallback does not support unstructured PII types like person_name. If your security policy depends on NER-based detection, monitor the Guardrails Service availability and alert on fallback events in the audit log.