adds attachment support

This commit is contained in:
2026-05-02 19:21:06 -07:00
parent 11e6875de9
commit 38da3cea72
15 changed files with 949 additions and 67 deletions

View File

@@ -10,6 +10,12 @@ Content type:
- Requests with bodies use `application/json`.
- Responses are JSON unless noted otherwise.
Chat upload limits:
- Chat completion and direct message payloads support inline attachments up to a 32 MB request body.
- Up to 8 attachments per message.
- Image attachments: PNG or JPEG only, max 6 MB each.
- Text attachments: up to 8 MB source size each; server accepts at most 200,000 characters of inlined text content per attachment.
## Health + Auth
### `GET /health`
@@ -74,11 +80,34 @@ Behavior notes:
"role": "system|user|assistant|tool",
"content": "string",
"name": "optional",
"metadata": {}
"metadata": {},
"attachments": [
{
"kind": "image",
"id": "attachment-id",
"filename": "photo.jpg",
"mimeType": "image/jpeg",
"sizeBytes": 12345,
"dataUrl": "data:image/jpeg;base64,..."
},
{
"kind": "text",
"id": "attachment-id",
"filename": "notes.md",
"mimeType": "text/markdown",
"sizeBytes": 4567,
"text": "# Notes\\n...",
"truncated": false
}
]
}
```
- Response: `{ "message": Message }`
Notes:
- `attachments` is optional and is merged into stored `message.metadata.attachments`.
- Tool messages should not include attachments.
## Chat Completions (non-streaming)
### `POST /v1/chat-completions`
@@ -89,7 +118,30 @@ Behavior notes:
"provider": "openai|anthropic|xai",
"model": "string",
"messages": [
{ "role": "system|user|assistant|tool", "content": "string", "name": "optional" }
{
"role": "system|user|assistant|tool",
"content": "string",
"name": "optional",
"attachments": [
{
"kind": "image",
"id": "attachment-id",
"filename": "photo.jpg",
"mimeType": "image/jpeg",
"sizeBytes": 12345,
"dataUrl": "data:image/jpeg;base64,..."
},
{
"kind": "text",
"id": "attachment-id",
"filename": "notes.md",
"mimeType": "text/markdown",
"sizeBytes": 4567,
"text": "# Notes\\n...",
"truncated": false
}
]
}
],
"temperature": 0.2,
"maxTokens": 256
@@ -112,7 +164,12 @@ Behavior notes:
- For `chatId` calls, server stores only *new* non-assistant messages from provided history to avoid duplicates.
- Server persists final assistant output and call metadata (`LlmCall`) in DB.
- Server updates chat-level model metadata on each call: `lastUsedProvider`/`lastUsedModel`; first successful/failed call also initializes `initiatedProvider`/`initiatedModel` if unset.
- Attachments are optional and currently apply to `user` messages. Persisted chat history stores them under `message.metadata.attachments`.
- Images are forwarded inline to providers as multimodal image parts. Use PNG or JPEG for cross-provider compatibility.
- Text files are forwarded as explicit text blocks rather than provider-managed file references. Large text attachments should already be truncated client-side before submission.
- For `openai` and `xai`, backend enables tool use during chat completion with an internal system instruction.
- For `openai` and `xai`, image attachments are sent as chat-completions content parts alongside text.
- For `anthropic`, image attachments are sent as Messages API `image` blocks using base64 source data; text attachments are added as `text` blocks.
- Available tool calls for chat: `web_search` and `fetch_url`.
- `web_search` returns ranked results with per-result summaries/snippets. Its backend engine is selected by `CHAT_WEB_SEARCH_ENGINE` (`exa` default, or `searxng` with `SEARXNG_BASE_URL` set). SearXNG mode requires the instance to allow `format=json`.
- `fetch_url` fetches a URL and returns plaintext page content (HTML converted to text server-side).
@@ -189,10 +246,32 @@ Search run notes:
"role": "system|user|assistant|tool",
"content": "...",
"name": null,
"metadata": null
"metadata": {
"attachments": [
{
"kind": "image",
"id": "attachment-id",
"filename": "photo.jpg",
"mimeType": "image/jpeg",
"sizeBytes": 12345,
"dataUrl": "data:image/jpeg;base64,..."
},
{
"kind": "text",
"id": "attachment-id",
"filename": "notes.md",
"mimeType": "text/markdown",
"sizeBytes": 4567,
"text": "# Notes\\n...",
"truncated": false
}
]
}
}
```
`metadata` remains nullable. Tool-call log messages still use `metadata.kind = "tool_call"`; regular user messages with attachments use `metadata.attachments`.
`ChatDetail`
```json
{

View File

@@ -9,6 +9,7 @@ Transport:
- HTTP response uses `Content-Type: text/event-stream; charset=utf-8`
- Events are emitted in SSE format (`event: ...`, `data: ...`)
- Request body is JSON
- Request body supports the same inline attachment schema and limits documented in `docs/api/rest.md`.
Authentication:
- Same as REST endpoints (`Authorization: Bearer <token>` when token mode is enabled)
@@ -21,7 +22,30 @@ Authentication:
"provider": "openai|anthropic|xai",
"model": "string",
"messages": [
{ "role": "system|user|assistant|tool", "content": "string", "name": "optional" }
{
"role": "system|user|assistant|tool",
"content": "string",
"name": "optional",
"attachments": [
{
"kind": "image",
"id": "attachment-id",
"filename": "photo.jpg",
"mimeType": "image/jpeg",
"sizeBytes": 12345,
"dataUrl": "data:image/jpeg;base64,..."
},
{
"kind": "text",
"id": "attachment-id",
"filename": "notes.md",
"mimeType": "text/markdown",
"sizeBytes": 4567,
"text": "# Notes\\n...",
"truncated": false
}
]
}
],
"temperature": 0.2,
"maxTokens": 256
@@ -32,6 +56,7 @@ Notes:
- If `chatId` is omitted, backend creates a new chat.
- If `chatId` is provided, backend validates it exists.
- Backend stores only new non-assistant input history rows to avoid duplicates.
- Attachments are optional and are persisted under `message.metadata.attachments` on stored user messages.
## Event Stream Contract
@@ -103,8 +128,9 @@ Event order:
## Provider Streaming Behavior
- `openai`: backend may execute internal tool calls (`web_search`, `fetch_url`) before producing final text.
- `openai`: image attachments are sent as chat-completions content parts; text attachments are inlined as text parts.
- `xai`: same tool-enabled behavior as OpenAI.
- `anthropic`: streamed via event stream; emits `delta` from `content_block_delta` with `text_delta`.
- `anthropic`: streamed via event stream; emits `delta` from `content_block_delta` with `text_delta`. Image attachments are sent as base64 `image` blocks and text attachments are appended as `text` blocks.
- `web_search` uses `CHAT_WEB_SEARCH_ENGINE` (`exa` default, or `searxng` with `SEARXNG_BASE_URL` set). SearXNG mode requires the instance to allow `format=json`. This only affects chat-mode tool calls, not search-mode endpoints.
Tool-enabled streaming notes (`openai`/`xai`):