# Streaming Chat API Contract This document defines the server-sent events (SSE) contract for chat completions. Endpoint: - `POST /v1/chat-completions/stream` Transport: - HTTP response uses `Content-Type: text/event-stream; charset=utf-8` - Events are emitted in SSE format (`event: ...`, `data: ...`) - Request body is JSON - Request body supports the same inline attachment schema and limits documented in `docs/api/rest.md`. Authentication: - Same as REST endpoints (`Authorization: Bearer ` when token mode is enabled) ## Request Body ```json { "chatId": "optional-chat-id", "persist": true, "provider": "openai|anthropic|xai", "model": "string", "messages": [ { "role": "system|user|assistant|tool", "content": "string", "name": "optional", "attachments": [ { "kind": "image", "id": "attachment-id", "filename": "photo.jpg", "mimeType": "image/jpeg", "sizeBytes": 12345, "dataUrl": "data:image/jpeg;base64,..." }, { "kind": "text", "id": "attachment-id", "filename": "notes.md", "mimeType": "text/markdown", "sizeBytes": 4567, "text": "# Notes\\n...", "truncated": false } ] } ], "temperature": 0.2, "maxTokens": 256 } ``` Notes: - `persist` defaults to `true`. - If `persist` is `true` and `chatId` is omitted, backend creates a new chat. - If `chatId` is provided, backend validates it exists. - If `persist` is `false`, `chatId` must be omitted. Backend does not create a chat and does not persist input messages, tool-call messages, assistant output, or `LlmCall` metadata. - For persisted streams, backend stores only new non-assistant input history rows to avoid duplicates. - Attachments are optional and are persisted under `message.metadata.attachments` on stored user messages when `persist` is `true`. ## Event Stream Contract Event order: 1. Exactly one `meta` 2. Zero or more `tool_call` 3. Zero or more `delta` 4. Exactly one terminal event: `done` or `error` ### `meta` ```json { "type": "meta", "chatId": "chat-id-or-null", "callId": "llm-call-id-or-null", "provider": "openai", "model": "gpt-4.1-mini" } ``` For `persist: false` streams, `chatId` and `callId` are `null`. ### `delta` ```json { "type": "delta", "text": "next chunk" } ``` `text` may contain partial words, punctuation, or whitespace. ### `tool_call` ```json { "toolCallId": "call_123", "name": "web_search", "status": "completed", "summary": "Performed web search for 'latest CPI release'.", "args": { "query": "latest CPI release" }, "startedAt": "2026-03-02T10:00:00.000Z", "completedAt": "2026-03-02T10:00:00.820Z", "durationMs": 820, "error": null, "resultPreview": "{\"ok\":true,...}" } ``` ### `done` ```json { "type": "done", "text": "full assistant response", "usage": { "inputTokens": 123, "outputTokens": 456, "totalTokens": 579 } } ``` `usage` may be omitted when provider does not expose final token accounting for stream mode. ### `error` ```json { "type": "error", "message": "provider timeout" } ``` ## Provider Streaming Behavior - `openai`: backend uses OpenAI's Responses API and may execute internal function tool calls (`web_search`, `fetch_url`, optional `codex_exec`, and optional `shell_exec`) before producing final text. - `xai`: backend uses xAI's OpenAI-compatible Chat Completions API and may execute the same internal tool calls before producing final text. - `openai`: image attachments are sent as Responses `input_image` items; text attachments are sent as `input_text` items. - `xai`: image attachments are sent as Chat Completions content parts; text attachments are inlined as text parts. - `openai`: Responses calls that can enter the server-managed tool loop use `store: true` so reasoning and function-call items can be passed between tool rounds. - `anthropic`: streamed via event stream; emits `delta` from `content_block_delta` with `text_delta`. Image attachments are sent as base64 `image` blocks and text attachments are appended as `text` blocks. - `web_search` uses `CHAT_WEB_SEARCH_ENGINE` (`exa` default, or `searxng` with `SEARXNG_BASE_URL` set). SearXNG mode requires the instance to allow `format=json`. This only affects chat-mode tool calls, not search-mode endpoints. - `codex_exec` is available only when `CHAT_CODEX_TOOL_ENABLED=true`. It SSHes to `CHAT_CODEX_REMOTE_HOST`, creates/uses `CHAT_CODEX_REMOTE_WORKDIR`, and runs `codex exec --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check ` there with SSH stdin closed. Prefer `CHAT_CODEX_SSH_KEY_PATH` with a read-only mounted private key; `CHAT_CODEX_SSH_PRIVATE_KEY_B64` is also supported. - `shell_exec` is available only when `CHAT_SHELL_TOOL_ENABLED=true`. It uses the same devbox SSH configuration, starts in `CHAT_CODEX_REMOTE_WORKDIR`, and runs non-interactive shell commands there with SSH stdin closed, not inside the Sybil server container. - `CHAT_MAX_TOOL_ROUNDS` controls how many model/tool result cycles may occur before the backend returns a tool-call limit message; default is 100. Tool-enabled streaming notes (`openai`/`xai`): - Stream still emits standard `meta`, `delta`, `done|error` events. - Stream may emit `tool_call` events while tool calls are executed. - `delta` events carry assistant text and are emitted incrementally for normal text rounds. The backend may buffer model-native text briefly while determining whether a provider round contains tool calls. - OpenAI Responses stream events are normalized by the backend into this SSE contract; clients do not consume OpenAI's raw Responses stream event names. ## Persistence + Consistency Model Backend database remains source of truth. For persisted streams: - Client may optimistically render accumulated `delta` text. - Backend persists each completed tool call as a `tool` message before emitting its `tool_call` SSE event, so chat detail refreshes can show completed tool calls while the assistant response is still running. On successful persisted completion: - Backend persists assistant `Message` and updates `LlmCall` usage/latency in a transaction. - Backend then emits `done`. On persisted failure: - Backend records call error and emits `error`. For `persist: false` streams: - Client may render the same `meta`, `tool_call`, `delta`, and terminal events. - Backend does not write any chat, message, tool-call log, assistant output, or call metadata rows. - `done.text` is the canonical assistant text if the client later imports the result into a saved chat. Client recommendation (for iOS/web): 1. Render deltas in real time for UX. 2. On `done`, refresh chat detail from REST (`GET /v1/chats/:chatId`) and use DB-backed data as canonical. 3. On `error`, preserve user input and show retry affordance. ## SSE Parsing Rules - Concatenate multiple `data:` lines with newline before JSON parse. - Event completes on blank line. - Ignore unknown event names for forward compatibility. ## Example Stream ```text event: meta data: {"type":"meta","chatId":"c1","callId":"k1","provider":"openai","model":"gpt-4.1-mini"} event: delta data: {"type":"delta","text":"Hello"} event: delta data: {"type":"delta","text":" world"} event: done data: {"type":"done","text":"Hello world"} ```