Sybil-2/server/README.md

# Sybil Server

Backend API for:
- LLM multiplexer (OpenAI / Anthropic / xAI (Grok))
- Personal chat database (chats/messages + LLM call log)

## Stack
- Node.js + TypeScript
- Fastify (HTTP)
- Prisma + SQLite (dev)

## Quick start

```bash
cp .env.example .env
npm run dev
```

Migrations are applied automatically on server startup (`prisma migrate deploy`).

Open docs: `http://localhost:8787/docs`

## Run Modes

- `npm run dev`: runs `src/index.ts` with `tsx` in watch mode (auto-restart on file changes). Use for local development.
- `npm run start`: runs compiled `dist/index.js` with Node.js (no watch mode). Use for production-like runs.

Both modes run startup checks (`predev` / `prestart`) and apply migrations at app boot.

## Auth

Set `ADMIN_TOKEN` and send:

`Authorization: Bearer <ADMIN_TOKEN>`

If `ADMIN_TOKEN` is not set, the server runs in open mode (dev).

## Env
- `OPENAI_API_KEY`
- `ANTHROPIC_API_KEY`
- `XAI_API_KEY`

## API
- `GET /health`
- `GET /v1/auth/session`
- `GET /v1/chats`
- `POST /v1/chats`
- `GET /v1/chats/:chatId`
- `POST /v1/chats/:chatId/messages`
- `POST /v1/chat-completions`
- `POST /v1/chat-completions/stream` (SSE)

When `chatId` is provided to completion endpoints, you can send full conversation context. The server now stores only new non-assistant messages to avoid duplicate history rows.

`POST /v1/chat-completions` body example:

```json
{
  "chatId": "<optional chat id>",
  "provider": "openai",
  "model": "gpt-4.1-mini",
  "messages": [
    {"role":"system","content":"You are helpful."},
    {"role":"user","content":"Say hi"}
  ],
  "temperature": 0.2,
  "maxTokens": 256
}
```

## Next steps (planned)
- Better streaming protocol compatibility (OpenAI-style chunks + cancellation)
- Tool/function calling normalization
- User accounts + per-device API keys
- Postgres support + migrations for prod
- Attachments + embeddings + semantic search