Files
Sybil-2/docs/api/streaming-chat.md

3.6 KiB

Streaming Chat API Contract

This document defines the server-sent events (SSE) contract for chat completions.

Endpoint:

  • POST /v1/chat-completions/stream

Transport:

  • HTTP response uses Content-Type: text/event-stream; charset=utf-8
  • Events are emitted in SSE format (event: ..., data: ...)
  • Request body is JSON

Authentication:

  • Same as REST endpoints (Authorization: Bearer <token> when token mode is enabled)

Request Body

{
  "chatId": "optional-chat-id",
  "provider": "openai|anthropic|xai",
  "model": "string",
  "messages": [
    { "role": "system|user|assistant|tool", "content": "string", "name": "optional" }
  ],
  "temperature": 0.2,
  "maxTokens": 256
}

Notes:

  • If chatId is omitted, backend creates a new chat.
  • If chatId is provided, backend validates it exists.
  • Backend stores only new non-assistant input history rows to avoid duplicates.

Event Stream Contract

Event order:

  1. Exactly one meta
  2. Zero or more tool_call
  3. Zero or more delta
  4. Exactly one terminal event: done or error

meta

{
  "type": "meta",
  "chatId": "chat-id",
  "callId": "llm-call-id",
  "provider": "openai",
  "model": "gpt-4.1-mini"
}

delta

{ "type": "delta", "text": "next chunk" }

text may contain partial words, punctuation, or whitespace.

tool_call

{
  "toolCallId": "call_123",
  "name": "web_search",
  "status": "completed",
  "summary": "Performed web search for 'latest CPI release'.",
  "args": { "query": "latest CPI release" },
  "startedAt": "2026-03-02T10:00:00.000Z",
  "completedAt": "2026-03-02T10:00:00.820Z",
  "durationMs": 820,
  "error": null,
  "resultPreview": "{\"ok\":true,...}"
}

done

{
  "type": "done",
  "text": "full assistant response",
  "usage": {
    "inputTokens": 123,
    "outputTokens": 456,
    "totalTokens": 579
  }
}

usage may be omitted when provider does not expose final token accounting for stream mode.

error

{ "type": "error", "message": "provider timeout" }

Provider Streaming Behavior

  • openai: backend may execute internal tool calls (web_search, fetch_url) before producing final text.
  • xai: same tool-enabled behavior as OpenAI.
  • anthropic: streamed via event stream; emits delta from content_block_delta with text_delta.

Tool-enabled streaming notes (openai/xai):

  • Stream still emits standard meta, delta, done|error events.
  • Stream may emit tool_call events while tool calls are executed.
  • delta events stream incrementally as text is generated.

Persistence + Consistency Model

Backend database remains source of truth.

During stream:

  • Client may optimistically render accumulated delta text.

On successful completion:

  • Backend persists assistant Message and updates LlmCall usage/latency in a transaction.
  • Backend then emits done.

On failure:

  • Backend records call error and emits error.

Client recommendation (for iOS/web):

  1. Render deltas in real time for UX.
  2. On done, refresh chat detail from REST (GET /v1/chats/:chatId) and use DB-backed data as canonical.
  3. On error, preserve user input and show retry affordance.

SSE Parsing Rules

  • Concatenate multiple data: lines with newline before JSON parse.
  • Event completes on blank line.
  • Ignore unknown event names for forward compatibility.

Example Stream

event: meta
data: {"type":"meta","chatId":"c1","callId":"k1","provider":"openai","model":"gpt-4.1-mini"}

event: delta
data: {"type":"delta","text":"Hello"}

event: delta
data: {"type":"delta","text":" world"}

event: done
data: {"type":"done","text":"Hello world"}