quick question feature

2026-05-02 23:48:01 -07:00
parent 6fbcaecbf8
commit 29e340fd08
8 changed files with 748 additions and 106 deletions
--- a/docs/api/streaming-chat.md
+++ b/docs/api/streaming-chat.md
@@ -19,6 +19,7 @@ Authentication:
 ```json
 {
  "chatId": "optional-chat-id",
+  "persist": true,
  "provider": "openai|anthropic|xai",
  "model": "string",
  "messages": [
@@ -53,10 +54,12 @@ Authentication:
 ```

 Notes:
- If `chatId` is omitted, backend creates a new chat.
+- `persist` defaults to `true`.
+- If `persist` is `true` and `chatId` is omitted, backend creates a new chat.
 - If `chatId` is provided, backend validates it exists.
- Backend stores only new non-assistant input history rows to avoid duplicates.
- Attachments are optional and are persisted under `message.metadata.attachments` on stored user messages.
+- If `persist` is `false`, `chatId` must be omitted. Backend does not create a chat and does not persist input messages, tool-call messages, assistant output, or `LlmCall` metadata.
+- For persisted streams, backend stores only new non-assistant input history rows to avoid duplicates.
+- Attachments are optional and are persisted under `message.metadata.attachments` on stored user messages when `persist` is `true`.

 ## Event Stream Contract

@@ -71,13 +74,15 @@ Event order:
 ```json
 {
  "type": "meta",
-  "chatId": "chat-id",
-  "callId": "llm-call-id",
+  "chatId": "chat-id-or-null",
+  "callId": "llm-call-id-or-null",
  "provider": "openai",
  "model": "gpt-4.1-mini"
 }
 ```

+For `persist: false` streams, `chatId` and `callId` are `null`.
+
 ### `delta`

 ```json
@@ -148,17 +153,22 @@ Tool-enabled streaming notes (`openai`/`xai`):

 Backend database remains source of truth.

-During stream:
+For persisted streams:
 - Client may optimistically render accumulated `delta` text.
 - Backend persists each completed tool call as a `tool` message before emitting its `tool_call` SSE event, so chat detail refreshes can show completed tool calls while the assistant response is still running.

-On successful completion:
+On successful persisted completion:
 - Backend persists assistant `Message` and updates `LlmCall` usage/latency in a transaction.
 - Backend then emits `done`.

-On failure:
+On persisted failure:
 - Backend records call error and emits `error`.

+For `persist: false` streams:
+- Client may render the same `meta`, `tool_call`, `delta`, and terminal events.
+- Backend does not write any chat, message, tool-call log, assistant output, or call metadata rows.
+- `done.text` is the canonical assistant text if the client later imports the result into a saved chat.
+
 Client recommendation (for iOS/web):
 1. Render deltas in real time for UX.
 2. On `done`, refresh chat detail from REST (`GET /v1/chats/:chatId`) and use DB-backed data as canonical.