- If `persist` is `false`, `chatId` must be omitted. Backend does not create a chat and does not persist input messages, tool-call messages, assistant output, or `LlmCall` metadata.
- For persisted streams, backend stores only new non-assistant input history rows to avoid duplicates.
- Attachments are optional and are persisted under `message.metadata.attachments` on stored user messages when `persist` is `true`.
Persisted chat streams with a `chatId` are backend-owned active runs:
- Once started, the backend keeps the stream running even if the HTTP client disconnects or refreshes.
- While running, `GET /v1/active-runs` includes the `chatId`.
- Starting a second persisted stream for the same active `chatId` returns `409`.
- Clients can reattach with `POST /v1/chats/:chatId/stream/attach`.
## Attach Endpoint
`POST /v1/chats/:chatId/stream/attach`
- Body: none.
- Response uses the same `text/event-stream` transport and event names as `POST /v1/chat-completions/stream`.
- Replays buffered events for the active in-memory stream, then emits new events until `done` or `error`.
- Returns `404 { "message": "active chat stream not found" }` if no stream is currently active for that chat.
- Authentication is the same as all other API endpoints.
This endpoint is intended for clients that restored an active `chatId` from `GET /v1/active-runs`, especially after browser refresh. Replayed `delta` events may include text that was originally emitted before the client attached.
-`openai`: backend uses OpenAI's Responses API and may execute internal function tool calls (`web_search`, `fetch_url`, optional `codex_exec`, and optional `shell_exec`) before producing final text.
-`xai`: backend uses xAI's OpenAI-compatible Chat Completions API and may execute the same internal tool calls before producing final text.
-`openai`: image attachments are sent as Responses `input_image` items; text attachments are sent as `input_text` items.
-`xai`: image attachments are sent as Chat Completions content parts; text attachments are inlined as text parts.
-`openai`: Responses calls that can enter the server-managed tool loop use `store: true` so reasoning and function-call items can be passed between tool rounds.
-`anthropic`: streamed via event stream; emits `delta` from `content_block_delta` with `text_delta`. Image attachments are sent as base64 `image` blocks and text attachments are appended as `text` blocks.
-`web_search` uses `CHAT_WEB_SEARCH_ENGINE` (`exa` default, or `searxng` with `SEARXNG_BASE_URL` set). SearXNG mode requires the instance to allow `format=json`. This only affects chat-mode tool calls, not search-mode endpoints.
-`codex_exec` is available only when `CHAT_CODEX_TOOL_ENABLED=true`. It SSHes to `CHAT_CODEX_REMOTE_HOST`, creates/uses `CHAT_CODEX_REMOTE_WORKDIR`, and runs `codex exec --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check <non-interactive wrapped prompt>` there with SSH stdin closed. Prefer `CHAT_CODEX_SSH_KEY_PATH` with a read-only mounted private key; `CHAT_CODEX_SSH_PRIVATE_KEY_B64` is also supported.
-`shell_exec` is available only when `CHAT_SHELL_TOOL_ENABLED=true`. It uses the same devbox SSH configuration, starts in `CHAT_CODEX_REMOTE_WORKDIR`, and runs non-interactive shell commands there with SSH stdin closed, not inside the Sybil server container.
-`delta` events carry assistant text and are emitted incrementally for normal text rounds. The backend may buffer model-native text briefly while determining whether a provider round contains tool calls.
- OpenAI Responses stream events are normalized by the backend into this SSE contract; clients do not consume OpenAI's raw Responses stream event names.
- Backend persists each completed tool call as a `tool` message before emitting its `tool_call` SSE event, so chat detail refreshes can show completed tool calls while the assistant response is still running.