3.0 KiB
3.0 KiB
Streaming Chat API Contract
This document defines the server-sent events (SSE) contract for chat completions.
Endpoint:
POST /v1/chat-completions/stream
Transport:
- HTTP response uses
Content-Type: text/event-stream; charset=utf-8 - Events are emitted in SSE format (
event: ...,data: ...) - Request body is JSON
Authentication:
- Same as REST endpoints (
Authorization: Bearer <token>when token mode is enabled)
Request Body
{
"chatId": "optional-chat-id",
"provider": "openai|anthropic|xai",
"model": "string",
"messages": [
{ "role": "system|user|assistant|tool", "content": "string", "name": "optional" }
],
"temperature": 0.2,
"maxTokens": 256
}
Notes:
- If
chatIdis omitted, backend creates a new chat. - If
chatIdis provided, backend validates it exists. - Backend stores only new non-assistant input history rows to avoid duplicates.
Event Stream Contract
Event order:
- Exactly one
meta - Zero or more
delta - Exactly one terminal event:
doneorerror
meta
{
"type": "meta",
"chatId": "chat-id",
"callId": "llm-call-id",
"provider": "openai",
"model": "gpt-4.1-mini"
}
delta
{ "type": "delta", "text": "next chunk" }
text may contain partial words, punctuation, or whitespace.
done
{
"type": "done",
"text": "full assistant response",
"usage": {
"inputTokens": 123,
"outputTokens": 456,
"totalTokens": 579
}
}
usage may be omitted when provider does not expose final token accounting for stream mode.
error
{ "type": "error", "message": "provider timeout" }
Provider Streaming Behavior
openai: streamed via OpenAI chat completion chunks; emitsdeltafromchoices[0].delta.content.xai: uses OpenAI-compatible API, same chunk extraction as OpenAI.anthropic: streamed via event stream; emitsdeltafromcontent_block_deltawithtext_delta.
Persistence + Consistency Model
Backend database remains source of truth.
During stream:
- Client may optimistically render accumulated
deltatext.
On successful completion:
- Backend persists assistant
Messageand updatesLlmCallusage/latency in a transaction. - Backend then emits
done.
On failure:
- Backend records call error and emits
error.
Client recommendation (for iOS/web):
- Render deltas in real time for UX.
- On
done, refresh chat detail from REST (GET /v1/chats/:chatId) and use DB-backed data as canonical. - On
error, preserve user input and show retry affordance.
SSE Parsing Rules
- Concatenate multiple
data:lines with newline before JSON parse. - Event completes on blank line.
- Ignore unknown event names for forward compatibility.
Example Stream
event: meta
data: {"type":"meta","chatId":"c1","callId":"k1","provider":"openai","model":"gpt-4.1-mini"}
event: delta
data: {"type":"delta","text":"Hello"}
event: delta
data: {"type":"delta","text":" world"}
event: done
data: {"type":"done","text":"Hello world"}