10 KiB
Streaming Chat API Contract
This document defines the server-sent events (SSE) contract for chat completions.
Endpoint:
POST /v1/chat-completions/streamPOST /v1/chats/:chatId/stream/attach
Transport:
- HTTP response uses
Content-Type: text/event-stream; charset=utf-8 - Events are emitted in SSE format (
event: ...,data: ...) - Request body is JSON
- Request body supports the same inline attachment schema and limits documented in
docs/api/rest.md.
Authentication:
- Same as REST endpoints (
Authorization: Bearer <token>when token mode is enabled)
Request Body
{
"chatId": "optional-chat-id",
"persist": true,
"provider": "openai|anthropic|xai|hermes-agent",
"model": "string",
"messages": [
{
"role": "system|user|assistant|tool",
"content": "string",
"name": "optional",
"attachments": [
{
"kind": "image",
"id": "attachment-id",
"filename": "photo.jpg",
"mimeType": "image/jpeg",
"sizeBytes": 12345,
"dataUrl": "data:image/jpeg;base64,..."
},
{
"kind": "text",
"id": "attachment-id",
"filename": "notes.md",
"mimeType": "text/markdown",
"sizeBytes": 4567,
"text": "# Notes\\n...",
"truncated": false
}
]
}
],
"additionalSystemPrompt": "optional one-off system prompt",
"enabledTools": ["web_search", "fetch_url"],
"temperature": 0.2,
"maxTokens": 256
}
Notes:
persistdefaults totrue.- If
persististrueandchatIdis omitted, backend creates a new chat. - If
chatIdis provided, backend validates it exists. - If
persistisfalse,chatIdmust be omitted. Backend does not create a chat and does not persist input messages, tool-call messages, assistant output, orLlmCallmetadata. - For persisted streams, backend stores only new non-assistant input history rows to avoid duplicates.
additionalSystemPrompt, when present directly or loaded from stored chat settings, is prepended to the provider request as asystemmessage and is not inserted into the persisted chat transcript by this endpoint.enabledToolslimits Sybil-managed tools for this request. When omitted for a saved chat, the stored chat setting is used; otherwise all available tools are enabled by default. An empty array disables Sybil-managed tools.- Attachments are optional and are persisted under
message.metadata.attachmentson stored user messages whenpersististrue.
Persisted chat streams with a chatId are backend-owned active runs:
- Once started, the backend keeps the stream running even if the HTTP client disconnects or refreshes.
- While running,
GET /v1/active-runsincludes thechatId. - Starting a second persisted stream for the same active
chatIdreturns409. - Clients can reattach with
POST /v1/chats/:chatId/stream/attach.
Attach Endpoint
POST /v1/chats/:chatId/stream/attach
- Body: none.
- Response uses the same
text/event-streamtransport and event names asPOST /v1/chat-completions/stream. - Replays buffered events for the active in-memory stream, then emits new events until
doneorerror. - Returns
404 { "message": "active chat stream not found" }if no stream is currently active for that chat. - Authentication is the same as all other API endpoints.
This endpoint is intended for clients that restored an active chatId from GET /v1/active-runs, especially after browser refresh. Replayed delta events may include text that was originally emitted before the client attached.
Event Stream Contract
Event order:
- Exactly one
meta - Zero or more
tool_call - Zero or more
delta - Exactly one terminal event:
doneorerror
Each tool invocation can emit multiple tool_call events with the same toolCallId. The backend emits status: "initiated" before the tool starts executing, then emits status: "completed" or status: "failed" when execution finishes. Clients should upsert by toolCallId instead of appending each event.
meta
{
"type": "meta",
"chatId": "chat-id-or-null",
"callId": "llm-call-id-or-null",
"provider": "openai",
"model": "gpt-4.1-mini"
}
For persist: false streams, chatId and callId are null.
delta
{ "type": "delta", "text": "next chunk" }
text may contain partial words, punctuation, or whitespace.
tool_call
{
"toolCallId": "call_123",
"name": "web_search",
"status": "initiated",
"summary": "Searching web for 'latest CPI release'.",
"args": { "query": "latest CPI release" },
"startedAt": "2026-03-02T10:00:00.000Z"
}
Terminal tool-call event:
{
"toolCallId": "call_123",
"name": "web_search",
"status": "completed",
"summary": "Performed web search for 'latest CPI release'.",
"args": { "query": "latest CPI release" },
"startedAt": "2026-03-02T10:00:00.000Z",
"completedAt": "2026-03-02T10:00:00.820Z",
"durationMs": 820,
"resultPreview": "{\"ok\":true,...}"
}
status is one of initiated, completed, or failed. completedAt and durationMs are only present on terminal events. error is present on failed terminal events; resultPreview is present on terminal events when available.
done
{
"type": "done",
"text": "full assistant response",
"usage": {
"inputTokens": 123,
"outputTokens": 456,
"totalTokens": 579
}
}
usage may be omitted when provider does not expose final token accounting for stream mode.
error
{ "type": "error", "message": "provider timeout" }
Provider Streaming Behavior
openai: backend uses OpenAI's Responses API and may execute internal function tool calls (web_search,fetch_url, optionalcodex_exec, and optionalshell_exec) before producing final text.xai: backend uses xAI's OpenAI-compatible Chat Completions API and may execute the same internal tool calls before producing final text.hermes-agent: backend uses the configured Hermes Agent OpenAI-compatible Chat Completions API. Sybil does not add its own tool definitions for this provider; Hermes Agent handles its own tools server-side. Custom Hermes stream events are normalized away unless they produce text deltas in this SSE contract.openai: image attachments are sent as Responsesinput_imageitems; text attachments are sent asinput_textitems.xaiandhermes-agent: image attachments are sent as Chat Completions content parts; text attachments are inlined as text parts.openai: Responses calls that can enter the server-managed tool loop usestore: trueso reasoning and function-call items can be passed between tool rounds.anthropic: streamed via event stream; emitsdeltafromcontent_block_deltawithtext_delta. Image attachments are sent as base64imageblocks and text attachments are appended astextblocks.web_searchusesCHAT_WEB_SEARCH_ENGINE(exadefault, orsearxngwithSEARXNG_BASE_URLset). SearXNG mode requires the instance to allowformat=json. This only affects chat-mode tool calls, not search-mode endpoints.codex_execis available only whenCHAT_CODEX_TOOL_ENABLED=true. It SSHes toCHAT_CODEX_REMOTE_HOST, creates/usesCHAT_CODEX_REMOTE_WORKDIR, and runscodex exec --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check <non-interactive wrapped prompt>there with SSH stdin closed. PreferCHAT_CODEX_SSH_KEY_PATHwith a read-only mounted private key;CHAT_CODEX_SSH_PRIVATE_KEY_B64is also supported.shell_execis available only whenCHAT_SHELL_TOOL_ENABLED=true. It uses the same devbox SSH configuration, starts inCHAT_CODEX_REMOTE_WORKDIR, and runs non-interactive shell commands there with SSH stdin closed, not inside the Sybil server container.CHAT_MAX_TOOL_ROUNDScontrols how many model/tool result cycles may occur before the backend returns a tool-call limit message; default is 100.
Tool-enabled streaming notes (openai/xai):
- Stream still emits standard
meta,delta,done|errorevents. - Stream may emit
tool_callevents while tool calls are executed. deltaevents carry assistant text and are emitted incrementally for normal text rounds. The backend may buffer model-native text briefly while determining whether a provider round contains tool calls.- OpenAI Responses stream events are normalized by the backend into this SSE contract; clients do not consume OpenAI's raw Responses stream event names.
Persistence + Consistency Model
Backend database remains source of truth.
For persisted streams:
- Client may optimistically render accumulated
deltatext. - Backend emits initiated tool-call events without persisting them.
- Backend persists each completed or failed tool call as a
toolmessage before emitting its terminaltool_callSSE event, so chat detail refreshes can show completed tool calls while the assistant response is still running.
On successful persisted completion:
- Backend persists assistant
Messageand updatesLlmCallusage/latency in a transaction. - Backend then emits
done.
On persisted failure:
- Backend records call error and emits
error.
For persist: false streams:
- Client may render the same
meta,tool_call,delta, and terminal events. - Backend does not write any chat, message, tool-call log, assistant output, or call metadata rows.
done.textis the canonical assistant text if the client later imports the result into a saved chat.
Client recommendation (for iOS/web):
- Render deltas in real time for UX.
- On
done, refresh chat detail from REST (GET /v1/chats/:chatId) and use DB-backed data as canonical. - On
error, preserve user input and show retry affordance.
SSE Parsing Rules
- Concatenate multiple
data:lines with newline before JSON parse. - Event completes on blank line.
- Ignore unknown event names for forward compatibility.
Example Stream
event: meta
data: {"type":"meta","chatId":"c1","callId":"k1","provider":"openai","model":"gpt-4.1-mini"}
event: delta
data: {"type":"delta","text":"Hello"}
event: delta
data: {"type":"delta","text":" world"}
event: done
data: {"type":"done","text":"Hello world"}