oai responses api, tool call retries

2026-05-02 21:44:32 -07:00
parent 8d6c069a33
commit 015253c0af
11 changed files with 369 additions and 40 deletions
--- a/docs/api/rest.md
+++ b/docs/api/rest.md
@@ -37,7 +37,7 @@ Chat upload limits:
  }
 }
 ```
- OpenAI model lists are filtered to models that are expected to work with the backend's current Chat Completions implementation.
+- OpenAI model lists are filtered to models that are expected to work with the backend's Responses API implementation.

 ## Chats

@@ -168,8 +168,11 @@ Behavior notes:
 - Attachments are optional and currently apply to `user` messages. Persisted chat history stores them under `message.metadata.attachments`.
 - Images are forwarded inline to providers as multimodal image parts. Use PNG or JPEG for cross-provider compatibility.
 - Text files are forwarded as explicit text blocks rather than provider-managed file references. Large text attachments should already be truncated client-side before submission.
- For `openai` and `xai`, backend enables tool use during chat completion with an internal system instruction.
- For `openai` and `xai`, image attachments are sent as chat-completions content parts alongside text.
+- For `openai`, backend calls OpenAI's Responses API and enables internal tool use with an internal system instruction.
+- For `xai`, backend calls xAI's OpenAI-compatible Chat Completions API and enables internal tool use with the same internal system instruction.
+- For `openai`, image attachments are sent as Responses `input_image` items and text attachments are sent as `input_text` items.
+- For `xai`, image attachments are sent as Chat Completions content parts alongside text.
+- For `openai`, Responses calls that can enter the server-managed tool loop use `store: true` so reasoning and function-call items can be passed between tool rounds.
 - For `anthropic`, image attachments are sent as Messages API `image` blocks using base64 source data; text attachments are added as `text` blocks.
 - Available tool calls for chat: `web_search` and `fetch_url`. When `CHAT_CODEX_TOOL_ENABLED=true`, `codex_exec` is also available. When `CHAT_SHELL_TOOL_ENABLED=true`, `shell_exec` is also available.
 - `web_search` returns ranked results with per-result summaries/snippets. Its backend engine is selected by `CHAT_WEB_SEARCH_ENGINE` (`exa` default, or `searxng` with `SEARXNG_BASE_URL` set). SearXNG mode requires the instance to allow `format=json`.
@@ -177,7 +180,7 @@ Behavior notes:
 - `codex_exec` delegates coding, shell, repository inspection, and other complex software tasks to a persistent remote Codex CLI workspace over SSH. The server runs `codex exec --skip-git-repo-check <non-interactive wrapped prompt>` on the configured devbox inside `CHAT_CODEX_REMOTE_WORKDIR`, with SSH stdin closed.
 - `shell_exec` runs arbitrary non-interactive shell commands on the same configured devbox, starting in `CHAT_CODEX_REMOTE_WORKDIR`. It uses `bash -lc` when bash exists, otherwise `sh -lc`, closes SSH stdin, and does not run inside the Sybil server container.
 - Devbox tool configuration:
-  - `CHAT_MAX_TOOL_ROUNDS=8` (optional; maximum model/tool result cycles before the backend returns a limit message)
+  - `CHAT_MAX_TOOL_ROUNDS=100` (optional; maximum model/tool result cycles before the backend returns a limit message)
  - `CHAT_CODEX_TOOL_ENABLED=true`
  - `CHAT_SHELL_TOOL_ENABLED=true`
  - `CHAT_CODEX_REMOTE_HOST=<host-or-ip>` (required when enabled)
--- a/docs/api/streaming-chat.md
+++ b/docs/api/streaming-chat.md
@@ -127,19 +127,22 @@ Event order:

 ## Provider Streaming Behavior

- `openai`/`xai`: backend may execute internal tool calls (`web_search`, `fetch_url`, optional `codex_exec`, and optional `shell_exec`) before producing final text.
- `openai`: image attachments are sent as chat-completions content parts; text attachments are inlined as text parts.
- `xai`: same attachment behavior as OpenAI.
+- `openai`: backend uses OpenAI's Responses API and may execute internal function tool calls (`web_search`, `fetch_url`, optional `codex_exec`, and optional `shell_exec`) before producing final text.
+- `xai`: backend uses xAI's OpenAI-compatible Chat Completions API and may execute the same internal tool calls before producing final text.
+- `openai`: image attachments are sent as Responses `input_image` items; text attachments are sent as `input_text` items.
+- `xai`: image attachments are sent as Chat Completions content parts; text attachments are inlined as text parts.
+- `openai`: Responses calls that can enter the server-managed tool loop use `store: true` so reasoning and function-call items can be passed between tool rounds.
 - `anthropic`: streamed via event stream; emits `delta` from `content_block_delta` with `text_delta`. Image attachments are sent as base64 `image` blocks and text attachments are appended as `text` blocks.
 - `web_search` uses `CHAT_WEB_SEARCH_ENGINE` (`exa` default, or `searxng` with `SEARXNG_BASE_URL` set). SearXNG mode requires the instance to allow `format=json`. This only affects chat-mode tool calls, not search-mode endpoints.
 - `codex_exec` is available only when `CHAT_CODEX_TOOL_ENABLED=true`. It SSHes to `CHAT_CODEX_REMOTE_HOST`, creates/uses `CHAT_CODEX_REMOTE_WORKDIR`, and runs `codex exec --skip-git-repo-check <non-interactive wrapped prompt>` there with SSH stdin closed. Prefer `CHAT_CODEX_SSH_KEY_PATH` with a read-only mounted private key; `CHAT_CODEX_SSH_PRIVATE_KEY_B64` is also supported.
 - `shell_exec` is available only when `CHAT_SHELL_TOOL_ENABLED=true`. It uses the same devbox SSH configuration, starts in `CHAT_CODEX_REMOTE_WORKDIR`, and runs non-interactive shell commands there with SSH stdin closed, not inside the Sybil server container.
- `CHAT_MAX_TOOL_ROUNDS` controls how many model/tool result cycles may occur before the backend returns a tool-call limit message; default is 8.
+- `CHAT_MAX_TOOL_ROUNDS` controls how many model/tool result cycles may occur before the backend returns a tool-call limit message; default is 100.

 Tool-enabled streaming notes (`openai`/`xai`):
 - Stream still emits standard `meta`, `delta`, `done|error` events.
 - Stream may emit `tool_call` events while tool calls are executed.
- `delta` events stream incrementally as text is generated.
+- `delta` events carry assistant text. The backend may buffer model-native text briefly while determining whether a provider round contains tool calls.
+- OpenAI Responses stream events are normalized by the backend into this SSE contract; clients do not consume OpenAI's raw Responses stream event names.

 ## Persistence + Consistency Model