Incident Report: Encrypted Content Failures in Multi-Region Responses API Load Balancing

February 24, 2026

Sameer Kankute

SWE @ LiteLLM (LLM Translation)

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Date: Feb 24, 2026
Duration: Ongoing (until fix deployed)
Severity: High (for users load balancing Responses API across different API keys)
Status: Resolved

Summary

When load balancing OpenAI's Responses API across deployments with different API keys (e.g., different Azure regions or OpenAI organizations), follow-up requests containing encrypted content items (like rs_... reasoning items) would fail with:

{
  "error": {
    "message": "The encrypted content for item rs_0d09d6e56879e76500699d6feee41c8197bd268aae76141f87 could not be verified. Reason: Encrypted content organization_id did not match the target organization.",
    "type": "invalid_request_error",
    "code": "invalid_encrypted_content"
  }
}

Encrypted content items are cryptographically tied to the API key's organization that created them. When the router load balanced a follow-up request to a deployment with a different API key, decryption failed.

Responses API calls with encrypted content: Complete failure when routed to wrong deployment
Initial requests: Unaffected — only follow-up requests containing encrypted items failed
Other API endpoints: No impact — chat completions, embeddings, etc. functioned normally

Background

OpenAI's Responses API can return encrypted "reasoning items" (with IDs like rs_...) that contain intermediate reasoning steps. These items are encrypted with the organization's key and can only be decrypted by the same organization's API key.

When load balancing across deployments with different API keys, the existing affinity mechanisms were insufficient:

responses_api_deployment_check: Requires previous_response_id which some clients (like Codex) don't provide
deployment_affinity: Too broad — pins all requests from a user to one deployment, reducing effective quota by the number of users
session_affinity: Requires explicit session IDs and still reduces quota

Root Cause

LiteLLM's router had no mechanism to track which deployment created specific encrypted content items and route follow-up requests accordingly. The router treated all deployments as interchangeable, leading to decryption failures when encrypted content crossed organizational boundaries.

The Problem Flow:

User calls router.aresponses() with model gpt-5.1-codex
Router load balances to Deployment A (Azure East US, API Key 1)
Response contains encrypted reasoning item rs_abc123 (encrypted with Org 1's key)
User makes follow-up request with rs_abc123 in the input
Router load balances to Deployment B (Azure West Europe, API Key 2)
Deployment B tries to decrypt rs_abc123 with Org 2's key → fails

Why Existing Solutions Didn't Work:

previous_response_id: Not provided by all clients (e.g., Codex)
deployment_affinity: Pins all user requests to one deployment → reduces quota to 1/N where N = number of deployments
session_affinity: Requires explicit session management and still reduces quota

Timeline:

Users configured multi-region Responses API load balancing with different API keys
Initial requests succeeded, but follow-up requests with encrypted content failed intermittently
Error rate correlated with number of deployments (more deployments = higher chance of routing to wrong one)
Investigation revealed encrypted content was organization-bound
Existing affinity mechanisms deemed unsuitable (quota reduction, missing previous_response_id)
New solution designed and implemented: encrypted_content_affinity

The Fix

Implemented a new encrypted_content_affinity pre-call check that intelligently tracks encrypted content and routes follow-up requests only when necessary.

Implementation

1. Encoding model_id into output items (responses/utils.py)

The same approach used for previous_response_id affinity — no cache needed. When a response contains output items with encrypted_content, LiteLLM encodes the originating deployment's model_id in two places for redundancy:

Into the item ID (if present): rs_abc123 → encitem_{base64("litellm:model_id:{model_id};item_id:rs_abc123")}
Into the encrypted_content itself: Wraps the content with litellm_enc:{base64("model_id:{model_id}")};{original_encrypted_content}

# Encoding item IDs (when present)
def _build_encrypted_item_id(model_id: str, item_id: str) -> str:
    assembled = f"litellm:model_id:{model_id};item_id:{item_id}"
    encoded = base64.b64encode(assembled.encode("utf-8")).decode("utf-8")
    return f"encitem_{encoded}"

# Wrapping encrypted_content (always, for redundancy)
def _wrap_encrypted_content_with_model_id(encrypted_content: str, model_id: str) -> str:
    metadata = f"model_id:{model_id}"
    encoded_metadata = base64.b64encode(metadata.encode("utf-8")).decode("utf-8")
    return f"litellm_enc:{encoded_metadata};{encrypted_content}"

Why wrap encrypted_content directly? Some clients (like Codex) don't consistently send item IDs in follow-up requests, but they always send the encrypted_content itself. By embedding model_id into the content, affinity works even when IDs are missing.

Streaming responses: The wrapping logic is applied to both:

Final response objects (non-streaming)
Individual streaming events (response.output_item.added, response.output_item.done)

This ensures clients receiving streaming responses get wrapped content they can send back.

Before forwarding to the upstream provider, LiteLLM restores the original item IDs and unwraps encrypted_content so the provider never sees the encoded form:

# In responses/main.py — before calling the handler
input = ResponsesAPIRequestUtils._restore_encrypted_content_item_ids_in_input(input)

2. EncryptedContentAffinityCheck — routing only (encrypted_content_affinity_check.py)

No async_log_success_event or cache lookups — the model_id is decoded directly from the item ID or encrypted_content:

class EncryptedContentAffinityCheck(CustomLogger):
    async def async_filter_deployments(self, model, healthy_deployments, ...):
        """Extract model_id from input items (ID or encrypted_content) and pin to that deployment."""
        for item in request_kwargs.get("input", []):
            # Try to extract model_id from two sources:
            model_id = self._extract_model_id_from_input(item)
            
            if model_id:
                deployment = self._find_deployment_by_model_id(
                    healthy_deployments, model_id
                )
                if deployment:
                    request_kwargs["_encrypted_content_affinity_pinned"] = True
                    return [deployment]
        return healthy_deployments
    
    def _extract_model_id_from_input(self, item: dict) -> Optional[str]:
        """Extract model_id from either encoded ID or wrapped encrypted_content."""
        # 1. Try decoding from item ID (if present)
        item_id = item.get("id", "")
        if item_id:
            decoded = ResponsesAPIRequestUtils._decode_encrypted_item_id(item_id)
            if decoded:
                return decoded["model_id"]
        
        # 2. Try unwrapping from encrypted_content (fallback for clients that omit IDs)
        encrypted_content = item.get("encrypted_content", "")
        if encrypted_content and encrypted_content.startswith("litellm_enc:"):
            model_id, _ = ResponsesAPIRequestUtils._unwrap_encrypted_content_with_model_id(
                encrypted_content
            )
            return model_id
        
        return None

3. Rate Limit Bypass (router.py)

When encrypted content requires a specific deployment, RPM/TPM limits are bypassed (the request would fail on any other deployment anyway):

# In async_get_available_deployment, after filtering healthy deployments:
if (
    request_kwargs.get("_encrypted_content_affinity_pinned")
    and len(healthy_deployments) == 1
):
    return healthy_deployments[0]  # Bypass routing strategy (RPM/TPM checks)

3. Configuration

router_settings:
  routing_strategy: usage-based-routing-v2
  enable_pre_call_checks: true
  optional_pre_call_checks:
    - encrypted_content_affinity
  deployment_affinity_ttl_seconds: 86400  # 24 hours

Key Benefits

✅ No quota reduction: Only pins requests containing encrypted items
✅ Bypasses rate limits: When encrypted content requires a specific deployment, RPM/TPM limits don't block it
✅ No previous_response_id required: Works by encoding model_id directly into the item ID
✅ No cache required: model_id is decoded on-the-fly from the item ID — no Redis, no TTL
✅ Globally safe: Can be enabled for all models; non-Responses-API calls are unaffected
✅ Surgical precision: Normal requests continue to load balance freely

Remediation

#	Action	Status	Code
1	Encode `model_id` into encrypted-content item IDs on response	✅ Done	`responses/utils.py`
2	Restore original item IDs before forwarding to upstream provider	✅ Done	`responses/main.py`
3	`EncryptedContentAffinityCheck`: decode item IDs to route (no cache)	✅ Done	`encrypted_content_affinity_check.py`
4	Add `encrypted_content_affinity` to `OptionalPreCallChecks` type	✅ Done	`types/router.py`
5	Implement rate limit bypass for affinity-pinned requests	✅ Done	`router.py`
6	Unit tests: encoding/decoding utilities, routing, RPM bypass	✅ Done	`test_encrypted_content_affinity_check.py`
7	Documentation: Responses API guide, load balancing guide, config reference	✅ Done	Docs
8	[Mar 3] Fix streaming events to wrap encrypted_content	✅ Done	`responses/streaming_iterator.py`

Follow-up Fix: Streaming Responses (Mar 3, 2026)

The Issue

After the initial fix was deployed, users reported that the invalid_encrypted_content error still occurred when using streaming responses with clients like Codex. Investigation revealed:

✅ Non-streaming responses: encrypted_content was correctly wrapped with litellm_enc: prefix
❌ Streaming responses: Individual response.output_item.added and response.output_item.done events contained raw, unwrapped encrypted_content

Since Codex and other clients consume responses as streams, they received unwrapped content in these events and sent it back in follow-up requests, causing the affinity check to fail.

The Root Cause

The _update_encrypted_content_item_ids_in_response function only modified the final response object, which is used for non-streaming responses. For streaming responses, individual chunks are processed by ResponsesAPIStreamingIterator._process_chunk, which was not applying the wrapping logic to streaming events.

The Fix

Modified litellm/litellm/responses/streaming_iterator.py to wrap encrypted_content in streaming events:

# In ResponsesAPIStreamingIterator._process_chunk
if (
    self.litellm_metadata
    and self.litellm_metadata.get("encrypted_content_affinity_enabled")
):
    event_type = getattr(openai_responses_api_chunk, "type", None)
    if event_type in (
        ResponsesAPIStreamEvents.OUTPUT_ITEM_ADDED,
        ResponsesAPIStreamEvents.OUTPUT_ITEM_DONE,
    ):
        item = getattr(openai_responses_api_chunk, "item", None)
        if item:
            encrypted_content = getattr(item, "encrypted_content", None)
            if encrypted_content and isinstance(encrypted_content, str):
                model_id = (
                    self.litellm_metadata.get("model_info", {}).get("id")
                    if self.litellm_metadata
                    else None
                )
                if model_id:
                    wrapped_content = ResponsesAPIRequestUtils._wrap_encrypted_content_with_model_id(
                        encrypted_content, model_id
                    )
                    setattr(item, "encrypted_content", wrapped_content)

This ensures that all encrypted_content sent to clients (streaming or non-streaming) is wrapped with model_id metadata, enabling consistent affinity routing.

Migration Guide

Before (Using `deployment_affinity`)

router_settings:
  optional_pre_call_checks:
    - deployment_affinity  # ❌ Reduces quota by number of users

Problem: All requests from a user pin to one deployment, reducing effective quota to 1/N.

After (Using `encrypted_content_affinity`)

router_settings:
  optional_pre_call_checks:
    - encrypted_content_affinity  # ✅ Only pins requests with encrypted content

Benefit: Normal requests load balance freely, only encrypted content requests pin when necessary.

Summary​

Background​

Root Cause​

The Fix​

Implementation​

Key Benefits​

Remediation​

Follow-up Fix: Streaming Responses (Mar 3, 2026)​

The Issue​

The Root Cause​

The Fix​

Migration Guide​

Before (Using deployment_affinity)​

After (Using encrypted_content_affinity)​

Summary

Background

Root Cause

The Fix

Implementation

Key Benefits

Remediation

Follow-up Fix: Streaming Responses (Mar 3, 2026)

The Issue

The Root Cause

The Fix

Migration Guide

Before (Using `deployment_affinity`)

After (Using `encrypted_content_affinity`)