docs

2026-02-14 21:20:14 -08:00
parent 642d0ba460
commit 7ef2825c16
5 changed files with 331 additions and 2 deletions
--- a/docs/api/streaming-chat.md
+++ b/docs/api/streaming-chat.md
@@ -0,0 +1,131 @@
+# Streaming Chat API Contract
+
+This document defines the server-sent events (SSE) contract for chat completions.
+
+Endpoint:
+- `POST /v1/chat-completions/stream`
+
+Transport:
+- HTTP response uses `Content-Type: text/event-stream; charset=utf-8`
+- Events are emitted in SSE format (`event: ...`, `data: ...`)
+- Request body is JSON
+
+Authentication:
+- Same as REST endpoints (`Authorization: Bearer <token>` when token mode is enabled)
+
+## Request Body
+
+```json
+{
+  "chatId": "optional-chat-id",
+  "provider": "openai|anthropic|xai",
+  "model": "string",
+  "messages": [
+    { "role": "system|user|assistant|tool", "content": "string", "name": "optional" }
+  ],
+  "temperature": 0.2,
+  "maxTokens": 256
+}
+```
+
+Notes:
+- If `chatId` is omitted, backend creates a new chat.
+- If `chatId` is provided, backend validates it exists.
+- Backend stores only new non-assistant input history rows to avoid duplicates.
+
+## Event Stream Contract
+
+Event order:
+1. Exactly one `meta`
+2. Zero or more `delta`
+3. Exactly one terminal event: `done` or `error`
+
+### `meta`
+
+```json
+{
+  "type": "meta",
+  "chatId": "chat-id",
+  "callId": "llm-call-id",
+  "provider": "openai",
+  "model": "gpt-4.1-mini"
+}
+```
+
+### `delta`
+
+```json
+{ "type": "delta", "text": "next chunk" }
+```
+
+`text` may contain partial words, punctuation, or whitespace.
+
+### `done`
+
+```json
+{
+  "type": "done",
+  "text": "full assistant response",
+  "usage": {
+    "inputTokens": 123,
+    "outputTokens": 456,
+    "totalTokens": 579
+  }
+}
+```
+
+`usage` may be omitted when provider does not expose final token accounting for stream mode.
+
+### `error`
+
+```json
+{ "type": "error", "message": "provider timeout" }
+```
+
+## Provider Streaming Behavior
+
+- `openai`: streamed via OpenAI chat completion chunks; emits `delta` from `choices[0].delta.content`.
+- `xai`: uses OpenAI-compatible API, same chunk extraction as OpenAI.
+- `anthropic`: streamed via event stream; emits `delta` from `content_block_delta` with `text_delta`.
+
+## Persistence + Consistency Model
+
+Backend database remains source of truth.
+
+During stream:
+- Client may optimistically render accumulated `delta` text.
+
+On successful completion:
+- Backend persists assistant `Message` and updates `LlmCall` usage/latency in a transaction.
+- Backend then emits `done`.
+
+On failure:
+- Backend records call error and emits `error`.
+
+Client recommendation (for iOS/web):
+1. Render deltas in real time for UX.
+2. On `done`, refresh chat detail from REST (`GET /v1/chats/:chatId`) and use DB-backed data as canonical.
+3. On `error`, preserve user input and show retry affordance.
+
+## SSE Parsing Rules
+
+- Concatenate multiple `data:` lines with newline before JSON parse.
+- Event completes on blank line.
+- Ignore unknown event names for forward compatibility.
+
+## Example Stream
+
+```text
+event: meta
+data: {"type":"meta","chatId":"c1","callId":"k1","provider":"openai","model":"gpt-4.1-mini"}
+
+event: delta
+data: {"type":"delta","text":"Hello"}
+
+event: delta
+data: {"type":"delta","text":" world"}
+
+event: done
+data: {"type":"done","text":"Hello world"}
+
+```