Sybil Server
Backend API for:
- LLM multiplexer (OpenAI Responses / Anthropic / xAI Chat Completions-compatible Grok / Hermes Agent)
- Personal chat database (chats/messages + LLM call log)
Stack
- Node.js + TypeScript
- Fastify (HTTP)
- Prisma + SQLite (dev)
Quick start
cp .env.example .env
npm run dev
Migrations are applied automatically on server startup (prisma migrate deploy).
Open docs: http://localhost:8787/docs
API contract docs for clients:
../docs/api/rest.md../docs/api/streaming-chat.md
Run Modes
npm run dev: runssrc/index.tswithtsxin watch mode (auto-restart on file changes). Use for local development.npm run start: runs compileddist/index.jswith Node.js (no watch mode). Use for production-like runs.
Both modes run startup checks (predev / prestart) and apply migrations at app boot.
Auth
Set ADMIN_TOKEN and send:
Authorization: Bearer <ADMIN_TOKEN>
If ADMIN_TOKEN is not set, the server runs in open mode (dev).
Env
OPENAI_API_KEYANTHROPIC_API_KEYXAI_API_KEYHERMES_AGENT_API_BASE_URL(http://127.0.0.1:8642/v1by default; include the/v1suffix)HERMES_AGENT_API_KEY(enables the Hermes Agent provider; set to HermesAPI_SERVER_KEY, or any non-empty value if that local server does not require auth)HERMES_AGENT_MODEL(optional fallback/override model id; defaults client-side tohermes-agent)EXA_API_KEYCHAT_WEB_SEARCH_ENGINE(exaby default, orsearxngfor chat tool calls only)SEARXNG_BASE_URL(required whenCHAT_WEB_SEARCH_ENGINE=searxng; instance must allowformat=json)CHAT_MAX_TOOL_ROUNDS(100by default; maximum model/tool result cycles per chat completion)CHAT_CODEX_TOOL_ENABLED(falseby default; enables thecodex_execchat tool for OpenAI/xAI)CHAT_CODEX_REMOTE_HOST(required when Codex tool is enabled; SSH host/IP oruser@host)CHAT_CODEX_REMOTE_USER(optional SSH user when host does not include one)CHAT_CODEX_REMOTE_PORT(22by default)CHAT_CODEX_REMOTE_WORKDIR(/workspace/sybil-codexby default; created and reused on the devbox)CHAT_CODEX_SSH_KEY_PATH(recommended: path to a read-only mounted private key)CHAT_CODEX_SSH_PRIVATE_KEY_B64(optional fallback private key delivery)CHAT_CODEX_EXEC_TIMEOUT_MS(600000by default)CHAT_SHELL_TOOL_ENABLED(falseby default; enables theshell_execchat tool for OpenAI/xAI on the same devbox)CHAT_SHELL_EXEC_TIMEOUT_MS(120000by default)
API
GET /healthGET /v1/auth/sessionGET /v1/chatsPOST /v1/chatsGET /v1/chats/:chatIdPOST /v1/chats/:chatId/messagesPOST /v1/chat-completionsPOST /v1/chat-completions/stream(SSE)GET /v1/searchesPOST /v1/searchesGET /v1/searches/:searchIdPOST /v1/searches/:searchId/runPOST /v1/searches/:searchId/run/stream(SSE)
Search runs now execute both Exa searchAndContents and Exa answer, storing:
- ranked search results (for result cards), and
- a top-level answer block + citations.
When chatId is provided to completion endpoints, you can send full conversation context. The server now stores only new non-assistant messages to avoid duplicate history rows.
POST /v1/chat-completions body example:
{
"chatId": "<optional chat id>",
"provider": "openai",
"model": "gpt-4.1-mini",
"messages": [
{"role":"system","content":"You are helpful."},
{"role":"user","content":"Say hi"}
],
"temperature": 0.2,
"maxTokens": 256
}
Next steps (planned)
- Better streaming protocol compatibility (OpenAI-style chunks + cancellation)
- Tool/function calling normalization
- User accounts + per-device API keys
- Postgres support + migrations for prod
- Attachments + embeddings + semantic search