8.7 KiB
Streaming Chat API Contract
This document defines the server-sent events (SSE) contract for chat completions.
Endpoint:
POST /v1/chat-completions/streamPOST /v1/chats/:chatId/stream/attach
Transport:
- HTTP response uses
Content-Type: text/event-stream; charset=utf-8 - Events are emitted in SSE format (
event: ...,data: ...) - Request body is JSON
- Request body supports the same inline attachment schema and limits documented in
docs/api/rest.md.
Authentication:
- Same as REST endpoints (
Authorization: Bearer <token>when token mode is enabled)
Request Body
{
"chatId": "optional-chat-id",
"persist": true,
"provider": "openai|anthropic|xai|hermes-agent",
"model": "string",
"messages": [
{
"role": "system|user|assistant|tool",
"content": "string",
"name": "optional",
"attachments": [
{
"kind": "image",
"id": "attachment-id",
"filename": "photo.jpg",
"mimeType": "image/jpeg",
"sizeBytes": 12345,
"dataUrl": "data:image/jpeg;base64,..."
},
{
"kind": "text",
"id": "attachment-id",
"filename": "notes.md",
"mimeType": "text/markdown",
"sizeBytes": 4567,
"text": "# Notes\\n...",
"truncated": false
}
]
}
],
"temperature": 0.2,
"maxTokens": 256
}
Notes:
persistdefaults totrue.- If
persististrueandchatIdis omitted, backend creates a new chat. - If
chatIdis provided, backend validates it exists. - If
persistisfalse,chatIdmust be omitted. Backend does not create a chat and does not persist input messages, tool-call messages, assistant output, orLlmCallmetadata. - For persisted streams, backend stores only new non-assistant input history rows to avoid duplicates.
- Attachments are optional and are persisted under
message.metadata.attachmentson stored user messages whenpersististrue.
Persisted chat streams with a chatId are backend-owned active runs:
- Once started, the backend keeps the stream running even if the HTTP client disconnects or refreshes.
- While running,
GET /v1/active-runsincludes thechatId. - Starting a second persisted stream for the same active
chatIdreturns409. - Clients can reattach with
POST /v1/chats/:chatId/stream/attach.
Attach Endpoint
POST /v1/chats/:chatId/stream/attach
- Body: none.
- Response uses the same
text/event-streamtransport and event names asPOST /v1/chat-completions/stream. - Replays buffered events for the active in-memory stream, then emits new events until
doneorerror. - Returns
404 { "message": "active chat stream not found" }if no stream is currently active for that chat. - Authentication is the same as all other API endpoints.
This endpoint is intended for clients that restored an active chatId from GET /v1/active-runs, especially after browser refresh. Replayed delta events may include text that was originally emitted before the client attached.
Event Stream Contract
Event order:
- Exactly one
meta - Zero or more
tool_call - Zero or more
delta - Exactly one terminal event:
doneorerror
meta
{
"type": "meta",
"chatId": "chat-id-or-null",
"callId": "llm-call-id-or-null",
"provider": "openai",
"model": "gpt-4.1-mini"
}
For persist: false streams, chatId and callId are null.
delta
{ "type": "delta", "text": "next chunk" }
text may contain partial words, punctuation, or whitespace.
tool_call
{
"toolCallId": "call_123",
"name": "web_search",
"status": "completed",
"summary": "Performed web search for 'latest CPI release'.",
"args": { "query": "latest CPI release" },
"startedAt": "2026-03-02T10:00:00.000Z",
"completedAt": "2026-03-02T10:00:00.820Z",
"durationMs": 820,
"error": null,
"resultPreview": "{\"ok\":true,...}"
}
done
{
"type": "done",
"text": "full assistant response",
"usage": {
"inputTokens": 123,
"outputTokens": 456,
"totalTokens": 579
}
}
usage may be omitted when provider does not expose final token accounting for stream mode.
error
{ "type": "error", "message": "provider timeout" }
Provider Streaming Behavior
openai: backend uses OpenAI's Responses API and may execute internal function tool calls (web_search,fetch_url, optionalcodex_exec, and optionalshell_exec) before producing final text.xai: backend uses xAI's OpenAI-compatible Chat Completions API and may execute the same internal tool calls before producing final text.hermes-agent: backend uses the configured Hermes Agent OpenAI-compatible Chat Completions API. Sybil does not add its own tool definitions for this provider; Hermes Agent handles its own tools server-side. Custom Hermes stream events are normalized away unless they produce text deltas in this SSE contract.openai: image attachments are sent as Responsesinput_imageitems; text attachments are sent asinput_textitems.xaiandhermes-agent: image attachments are sent as Chat Completions content parts; text attachments are inlined as text parts.openai: Responses calls that can enter the server-managed tool loop usestore: trueso reasoning and function-call items can be passed between tool rounds.anthropic: streamed via event stream; emitsdeltafromcontent_block_deltawithtext_delta. Image attachments are sent as base64imageblocks and text attachments are appended astextblocks.web_searchusesCHAT_WEB_SEARCH_ENGINE(exadefault, orsearxngwithSEARXNG_BASE_URLset). SearXNG mode requires the instance to allowformat=json. This only affects chat-mode tool calls, not search-mode endpoints.codex_execis available only whenCHAT_CODEX_TOOL_ENABLED=true. It SSHes toCHAT_CODEX_REMOTE_HOST, creates/usesCHAT_CODEX_REMOTE_WORKDIR, and runscodex exec --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check <non-interactive wrapped prompt>there with SSH stdin closed. PreferCHAT_CODEX_SSH_KEY_PATHwith a read-only mounted private key;CHAT_CODEX_SSH_PRIVATE_KEY_B64is also supported.shell_execis available only whenCHAT_SHELL_TOOL_ENABLED=true. It uses the same devbox SSH configuration, starts inCHAT_CODEX_REMOTE_WORKDIR, and runs non-interactive shell commands there with SSH stdin closed, not inside the Sybil server container.CHAT_MAX_TOOL_ROUNDScontrols how many model/tool result cycles may occur before the backend returns a tool-call limit message; default is 100.
Tool-enabled streaming notes (openai/xai):
- Stream still emits standard
meta,delta,done|errorevents. - Stream may emit
tool_callevents while tool calls are executed. deltaevents carry assistant text and are emitted incrementally for normal text rounds. The backend may buffer model-native text briefly while determining whether a provider round contains tool calls.- OpenAI Responses stream events are normalized by the backend into this SSE contract; clients do not consume OpenAI's raw Responses stream event names.
Persistence + Consistency Model
Backend database remains source of truth.
For persisted streams:
- Client may optimistically render accumulated
deltatext. - Backend persists each completed tool call as a
toolmessage before emitting itstool_callSSE event, so chat detail refreshes can show completed tool calls while the assistant response is still running.
On successful persisted completion:
- Backend persists assistant
Messageand updatesLlmCallusage/latency in a transaction. - Backend then emits
done.
On persisted failure:
- Backend records call error and emits
error.
For persist: false streams:
- Client may render the same
meta,tool_call,delta, and terminal events. - Backend does not write any chat, message, tool-call log, assistant output, or call metadata rows.
done.textis the canonical assistant text if the client later imports the result into a saved chat.
Client recommendation (for iOS/web):
- Render deltas in real time for UX.
- On
done, refresh chat detail from REST (GET /v1/chats/:chatId) and use DB-backed data as canonical. - On
error, preserve user input and show retry affordance.
SSE Parsing Rules
- Concatenate multiple
data:lines with newline before JSON parse. - Event completes on blank line.
- Ignore unknown event names for forward compatibility.
Example Stream
event: meta
data: {"type":"meta","chatId":"c1","callId":"k1","provider":"openai","model":"gpt-4.1-mini"}
event: delta
data: {"type":"delta","text":"Hello"}
event: delta
data: {"type":"delta","text":" world"}
event: done
data: {"type":"done","text":"Hello world"}