# Streaming Chat API Contract This document defines the server-sent events (SSE) contract for chat completions. Endpoint: - `POST /v1/chat-completions/stream` Transport: - HTTP response uses `Content-Type: text/event-stream; charset=utf-8` - Events are emitted in SSE format (`event: ...`, `data: ...`) - Request body is JSON Authentication: - Same as REST endpoints (`Authorization: Bearer ` when token mode is enabled) ## Request Body ```json { "chatId": "optional-chat-id", "provider": "openai|anthropic|xai", "model": "string", "messages": [ { "role": "system|user|assistant|tool", "content": "string", "name": "optional" } ], "temperature": 0.2, "maxTokens": 256 } ``` Notes: - If `chatId` is omitted, backend creates a new chat. - If `chatId` is provided, backend validates it exists. - Backend stores only new non-assistant input history rows to avoid duplicates. ## Event Stream Contract Event order: 1. Exactly one `meta` 2. Zero or more `delta` 3. Exactly one terminal event: `done` or `error` ### `meta` ```json { "type": "meta", "chatId": "chat-id", "callId": "llm-call-id", "provider": "openai", "model": "gpt-4.1-mini" } ``` ### `delta` ```json { "type": "delta", "text": "next chunk" } ``` `text` may contain partial words, punctuation, or whitespace. ### `done` ```json { "type": "done", "text": "full assistant response", "usage": { "inputTokens": 123, "outputTokens": 456, "totalTokens": 579 } } ``` `usage` may be omitted when provider does not expose final token accounting for stream mode. ### `error` ```json { "type": "error", "message": "provider timeout" } ``` ## Provider Streaming Behavior - `openai`: streamed via OpenAI chat completion chunks; emits `delta` from `choices[0].delta.content`. - `xai`: uses OpenAI-compatible API, same chunk extraction as OpenAI. - `anthropic`: streamed via event stream; emits `delta` from `content_block_delta` with `text_delta`. ## Persistence + Consistency Model Backend database remains source of truth. During stream: - Client may optimistically render accumulated `delta` text. On successful completion: - Backend persists assistant `Message` and updates `LlmCall` usage/latency in a transaction. - Backend then emits `done`. On failure: - Backend records call error and emits `error`. Client recommendation (for iOS/web): 1. Render deltas in real time for UX. 2. On `done`, refresh chat detail from REST (`GET /v1/chats/:chatId`) and use DB-backed data as canonical. 3. On `error`, preserve user input and show retry affordance. ## SSE Parsing Rules - Concatenate multiple `data:` lines with newline before JSON parse. - Event completes on blank line. - Ignore unknown event names for forward compatibility. ## Example Stream ```text event: meta data: {"type":"meta","chatId":"c1","callId":"k1","provider":"openai","model":"gpt-4.1-mini"} event: delta data: {"type":"delta","text":"Hello"} event: delta data: {"type":"delta","text":" world"} event: done data: {"type":"done","text":"Hello world"} ```