Skip to main content

What You’ll Learn

  • The streaming rehydration challenge: tokens split across SSE chunk boundaries
  • Policy configuration for streaming-compatible PII rehydration
  • The proxy’s five-step buffering strategy (accumulate, detect, flush, replace, end)
  • A concrete walkthrough of a 7-chunk SSE stream with token restoration
  • Latency considerations for streaming vs. non-streaming modes

The Challenge

LLM APIs stream responses token-by-token via Server-Sent Events (SSE). A redaction token like [REDACTED_EMAIL_1] may be split across multiple chunks:
chunk 4: " [RED"
chunk 5: "ACTED"
chunk 6: "_EMAIL_1"
chunk 7: "]"
A naive replacement approach would fail to detect and restore the PII. The Declaw proxy uses a buffering strategy to solve this.
PII scanning via the guardrails service is rolling out. This example demonstrates the SDK API and explains the expected behavior once the service is active on your account.

Prerequisites

This example is available in Python. TypeScript version coming soon.

Code Walkthrough

Policy configuration — streaming rehydration uses the same rehydrate_response=True flag:
from declaw import Sandbox, SecurityPolicy, PIIConfig

policy = SecurityPolicy(
    pii=PIIConfig(
        enabled=True,
        types=["email", "phone", "ssn"],
        action="redact",
        rehydrate_response=True,
    )
)

sbx = Sandbox.create(template="base", timeout=300, security=policy)

The Proxy’s Buffering Strategy

When an SSE stream is active, the proxy applies a five-step strategy:
  1. Buffer accumulation — As SSE chunks arrive, the proxy accumulates text in an internal buffer rather than forwarding immediately.
  2. Token boundary detection — The proxy scans the buffer for complete redaction tokens (e.g., [REDACTED_EMAIL_1]). It also checks for partial token prefixes at the buffer tail.
  3. Safe flush — Text confirmed to not contain (partial) tokens is flushed to the client. Text that might be part of a token is held in the buffer.
  4. Token replacement — When a complete token is detected, the proxy replaces it with the original PII value from its mapping table and flushes the restored text.
  5. Stream end — When the SSE stream ends, any remaining buffered text is flushed (no partial token can match at this point).

Concrete Example

Suppose sandbox code calls an LLM API with:
"Please reply to alice@company.com about the project"
Outbound (proxy redacts):
-> "Please reply to [REDACTED_EMAIL_1] about the project"
LLM streams back 7 SSE chunks:
chunk 1: "Sure, I will"      -> no token prefix, flush immediately
chunk 2: " send a message"   -> no token prefix, flush immediately
chunk 3: " to [RED"          -> partial token prefix detected, hold in buffer
chunk 4: "ACTED"             -> still partial, hold
chunk 5: "_EMAIL_1"          -> still partial, hold
chunk 6: "] right"           -> complete token! replace and flush
chunk 7: " away."            -> no token, flush
Delivered to sandbox code:
"Sure, I will send a message to alice@company.com right away."
The sandbox code receives the fully rehydrated stream as if no redaction ever occurred. This is demonstrated in the example source using a Python simulation:
chunks = [
    'Sure, I will',
    ' send a message',
    ' to [RED',
    'ACTED',
    '_EMAIL_1',
    '] right',
    ' away.',
]

token_map = {"[REDACTED_EMAIL_1]": "alice@company.com"}

Latency Considerations

Streaming rehydration introduces a small latency cost:
  • Buffering delays delivery of chunks near token boundaries
  • Most chunks (those without token prefixes) pass through with negligible delay
  • The added latency is typically less than 50ms per token boundary
For applications where streaming latency is critical and PII restoration is not needed, set rehydrate_response=False. The proxy will not buffer or scan response chunks.