2026-02-14 21:20:14 -08:00
|
|
|
# Streaming Chat API Contract
|
|
|
|
|
|
|
|
|
|
This document defines the server-sent events (SSE) contract for chat completions.
|
|
|
|
|
|
|
|
|
|
Endpoint:
|
|
|
|
|
- `POST /v1/chat-completions/stream`
|
|
|
|
|
|
|
|
|
|
Transport:
|
|
|
|
|
- HTTP response uses `Content-Type: text/event-stream; charset=utf-8`
|
|
|
|
|
- Events are emitted in SSE format (`event: ...`, `data: ...`)
|
|
|
|
|
- Request body is JSON
|
|
|
|
|
|
|
|
|
|
Authentication:
|
|
|
|
|
- Same as REST endpoints (`Authorization: Bearer <token>` when token mode is enabled)
|
|
|
|
|
|
|
|
|
|
## Request Body
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"chatId": "optional-chat-id",
|
|
|
|
|
"provider": "openai|anthropic|xai",
|
|
|
|
|
"model": "string",
|
|
|
|
|
"messages": [
|
|
|
|
|
{ "role": "system|user|assistant|tool", "content": "string", "name": "optional" }
|
|
|
|
|
],
|
|
|
|
|
"temperature": 0.2,
|
|
|
|
|
"maxTokens": 256
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Notes:
|
|
|
|
|
- If `chatId` is omitted, backend creates a new chat.
|
|
|
|
|
- If `chatId` is provided, backend validates it exists.
|
|
|
|
|
- Backend stores only new non-assistant input history rows to avoid duplicates.
|
|
|
|
|
|
|
|
|
|
## Event Stream Contract
|
|
|
|
|
|
|
|
|
|
Event order:
|
|
|
|
|
1. Exactly one `meta`
|
2026-03-02 16:13:34 -08:00
|
|
|
2. Zero or more `tool_call`
|
|
|
|
|
3. Zero or more `delta`
|
|
|
|
|
4. Exactly one terminal event: `done` or `error`
|
2026-02-14 21:20:14 -08:00
|
|
|
|
|
|
|
|
### `meta`
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"type": "meta",
|
|
|
|
|
"chatId": "chat-id",
|
|
|
|
|
"callId": "llm-call-id",
|
|
|
|
|
"provider": "openai",
|
|
|
|
|
"model": "gpt-4.1-mini"
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### `delta`
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
{ "type": "delta", "text": "next chunk" }
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
`text` may contain partial words, punctuation, or whitespace.
|
|
|
|
|
|
2026-03-02 16:13:34 -08:00
|
|
|
### `tool_call`
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"toolCallId": "call_123",
|
|
|
|
|
"name": "web_search",
|
|
|
|
|
"status": "completed",
|
|
|
|
|
"summary": "Performed web search for 'latest CPI release'.",
|
|
|
|
|
"args": { "query": "latest CPI release" },
|
|
|
|
|
"startedAt": "2026-03-02T10:00:00.000Z",
|
|
|
|
|
"completedAt": "2026-03-02T10:00:00.820Z",
|
|
|
|
|
"durationMs": 820,
|
|
|
|
|
"error": null,
|
|
|
|
|
"resultPreview": "{\"ok\":true,...}"
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
2026-02-14 21:20:14 -08:00
|
|
|
### `done`
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"type": "done",
|
|
|
|
|
"text": "full assistant response",
|
|
|
|
|
"usage": {
|
|
|
|
|
"inputTokens": 123,
|
|
|
|
|
"outputTokens": 456,
|
|
|
|
|
"totalTokens": 579
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
`usage` may be omitted when provider does not expose final token accounting for stream mode.
|
|
|
|
|
|
|
|
|
|
### `error`
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
{ "type": "error", "message": "provider timeout" }
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Provider Streaming Behavior
|
|
|
|
|
|
2026-03-02 16:13:34 -08:00
|
|
|
- `openai`: backend may execute internal tool calls (`web_search`, `fetch_url`) before producing final text.
|
|
|
|
|
- `xai`: same tool-enabled behavior as OpenAI.
|
2026-02-14 21:20:14 -08:00
|
|
|
- `anthropic`: streamed via event stream; emits `delta` from `content_block_delta` with `text_delta`.
|
|
|
|
|
|
2026-03-02 16:13:34 -08:00
|
|
|
Tool-enabled streaming notes (`openai`/`xai`):
|
|
|
|
|
- Stream still emits standard `meta`, `delta`, `done|error` events.
|
2026-03-02 16:39:05 -08:00
|
|
|
- Stream may emit `tool_call` events while tool calls are executed.
|
|
|
|
|
- `delta` events stream incrementally as text is generated.
|
2026-03-02 16:13:34 -08:00
|
|
|
|
2026-02-14 21:20:14 -08:00
|
|
|
## Persistence + Consistency Model
|
|
|
|
|
|
|
|
|
|
Backend database remains source of truth.
|
|
|
|
|
|
|
|
|
|
During stream:
|
|
|
|
|
- Client may optimistically render accumulated `delta` text.
|
|
|
|
|
|
|
|
|
|
On successful completion:
|
|
|
|
|
- Backend persists assistant `Message` and updates `LlmCall` usage/latency in a transaction.
|
|
|
|
|
- Backend then emits `done`.
|
|
|
|
|
|
|
|
|
|
On failure:
|
|
|
|
|
- Backend records call error and emits `error`.
|
|
|
|
|
|
|
|
|
|
Client recommendation (for iOS/web):
|
|
|
|
|
1. Render deltas in real time for UX.
|
|
|
|
|
2. On `done`, refresh chat detail from REST (`GET /v1/chats/:chatId`) and use DB-backed data as canonical.
|
|
|
|
|
3. On `error`, preserve user input and show retry affordance.
|
|
|
|
|
|
|
|
|
|
## SSE Parsing Rules
|
|
|
|
|
|
|
|
|
|
- Concatenate multiple `data:` lines with newline before JSON parse.
|
|
|
|
|
- Event completes on blank line.
|
|
|
|
|
- Ignore unknown event names for forward compatibility.
|
|
|
|
|
|
|
|
|
|
## Example Stream
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
event: meta
|
|
|
|
|
data: {"type":"meta","chatId":"c1","callId":"k1","provider":"openai","model":"gpt-4.1-mini"}
|
|
|
|
|
|
|
|
|
|
event: delta
|
|
|
|
|
data: {"type":"delta","text":"Hello"}
|
|
|
|
|
|
|
|
|
|
event: delta
|
|
|
|
|
data: {"type":"delta","text":" world"}
|
|
|
|
|
|
|
|
|
|
event: done
|
|
|
|
|
data: {"type":"done","text":"Hello world"}
|
|
|
|
|
|
|
|
|
|
```
|