Enable injection defense
injection_defense=True uses defaults (action="block", threshold=0.8):
InjectionDefenseConfig model
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | False | Activate injection scanning |
action | InjectionAction | "block" | What to do when injection is detected |
threshold | float | 0.8 | Confidence threshold 0.0–1.0; higher = fewer false positives |
domains | list[str] | None | all domains | Limit scanning to specific destination domains |
InjectionAction enum
| Value | Behavior |
|---|---|
block | Reject the request or response. The agent receives an error rather than the injected content. |
log | Allow the content through but write the detection to the audit log. |
How detection works
Without the Guardrails Service, the proxy uses a pattern library to detect known injection attempts:qualifire/prompt-injection-sentinel model scores each request body. The model returns a confidence score between 0.0 and 1.0. When the score exceeds threshold, the configured action is applied.
Scanning directions
Injection defense scans both directions of traffic: Outbound scanning catches injections that the agent might unknowingly include in LLM prompts from user inputs or retrieved documents. Inbound scanning catches indirect prompt injection — malicious instructions embedded in web pages, API responses, or tool outputs that would be included in the agent’s context.Sensitivity thresholds
| Threshold | Behavior |
|---|---|
0.5 | Aggressive — blocks more content, higher false positive rate |
0.8 | Balanced (default) — good balance between detection and false positives |
0.95 | Conservative — only blocks high-confidence injections |
Example: agent protected from indirect injection
Combining with transformation rules
UseTransformationRule for deterministic pattern removal alongside probabilistic injection defense:
For production deployments handling sensitive agent workloads, deploy the Guardrails Service to use the
qualifire/prompt-injection-sentinel ML model. The built-in pattern library covers known attack signatures but cannot detect novel injection techniques that the model can.