Files
carplay/AGENTS.md

248 lines
11 KiB
Markdown
Raw Normal View History

2026-05-01 23:53:48 -07:00
# AGENTS.md
This project is a web video player for clients that can decode audio and still images, but cannot use browser video decoding. Preserve that constraint when iterating: the browser should not use `<video>` playback for the source stream.
## Product Shape
The UI intentionally has only two screens:
- URL entry screen with a stream URL input, a `Next` button, and globally stored recently played URLs.
- Fullscreen player screen with JPEG frames drawn to a canvas and native audio playback through an `<audio>` element.
- Playback controls are overlay controls toggled by tapping/clicking the frame area, similar to YouTube.
- Do not reintroduce debug panels, frame counters, settings forms, explanatory marketing copy, or visible ffmpeg details into the normal UI.
The backend stores recently played URLs globally, not per-browser. The default path is `data/recent-urls.json`, configurable with `RECENT_URLS_PATH`. Docker Compose persists this through the `frame-stream-data` volume.
## Core Architecture
The app is plain Node/Express plus browser JavaScript:
- `server/index.js`: API, WebSocket, source proxy/relay, ffmpeg process lifecycle, recent URL persistence.
- `public/index.html`: frontend markup.
- `public/app.js`: URL submission, WebSocket frame receiving, audio element coordination, canvas drawing, overlay controls.
- `public/styles.css`: two-screen player UI.
- `Dockerfile`: production image with Node and ffmpeg.
- `docker-compose-example.yml`: operational example and default env knobs.
Main public endpoints:
- `POST /api/session`: validates the stream URL, stores recent URL, creates a short-lived playback session.
- `GET /api/recent-urls`: returns global recent URL entries with `url`, redacted `displayUrl`, and `lastPlayedAt`.
- `GET /audio/:sessionId`: serves MP3 audio to the browser audio element.
- `WS /frames/:sessionId`: sends timed JPEG frame packets to the browser.
- `GET /api/health`: exposes basic health and active playback connection mode.
Internal endpoint:
- `/_source/:token`: short-lived local proxy used by ffmpeg in split and single modes. This keeps original source URLs and query tokens out of ffmpeg process args and lets the server log upstream open/close behavior.
## Browser Playback Model
Audio is the playback clock. The server sends JPEG frames over WebSocket. Each binary frame packet is:
- First 8 bytes: little-endian float64 timestamp in seconds.
- Remaining bytes: one complete JPEG image.
The frontend decodes JPEGs with browser image APIs, queues frames, and paints frames whose timestamps are due relative to `audio.currentTime`. This means the browser decodes only audio and still images, not video.
## Why JPEG Frames
Keep JPEG unless there is a measured reason to change it.
- PNG is usually too large for 24 fps video.
- GIF has poor quality, weak timing control, and awkward streaming behavior.
- JPEG is browser-native, streamable frame-by-frame, much smaller than PNG, and simple to parse with SOI/EOI markers.
The server currently emits MJPEG through ffmpeg `image2pipe`. Frame parsing is done by scanning for JPEG SOI `0xff 0xd8` and EOI `0xff 0xd9`.
## Playback Modes
Playback mode is selected by `PLAYBACK_CONNECTION_MODE`. The code also accepts legacy aliases through `PLAYBACK_MODE`.
Use these modes deliberately:
- `split`: Smoothest mode. Starts separate ffmpeg workers and separate upstream source connections for audio and frames. Use for normal files and servers that allow multiple active connections.
- `relay`: IPTV-oriented mode. Opens one upstream HTTP connection from Node, then tees compressed input bytes into separate audio and frame ffmpeg workers through stdin. This preserves one source connection while isolating audio and frame processing.
- `single`: Fallback mode. Opens one upstream connection and one ffmpeg process with both MP3 and MJPEG outputs. This avoids multiple source connections but can stutter because audio and frame outputs are coupled inside one ffmpeg process.
The regression history matters:
- `split` was smooth, but some IPTV servers ended streams early when ffmpeg opened multiple connections/ranges.
- `single` fixed the one-active-connection issue, but introduced stutter because audio output backpressure and frame generation shared one ffmpeg process.
- `relay` was added to combine one upstream connection with separate ffmpeg workers.
Default code behavior is `split` when no mode is set. The Compose example uses `relay` because it is the mode to try for IPTV streams.
## ffmpeg Pipelines
All ffmpeg command builders live near the bottom of `server/index.js`.
Common HTTP input args:
- `-hide_banner`
- `-nostdin`
- `-loglevel ${FFMPEG_LOG_LEVEL}`
- `-nostats`
- `-seekable ${FFMPEG_INPUT_SEEKABLE}`
- `-re`
- `-i <inputUrl>`
Pipe input args for relay intentionally skip `-seekable`. Some ffmpeg builds reject `-seekable` on `pipe:0` with `Option seekable not found`.
Audio output:
- Maps `0:a:0?`
- Disables video with `-vn`
- Converts to stereo 48 kHz MP3 with `libmp3lame`
- Uses `session.options.audioBitrate`, default `160k`
- Outputs to `pipe:1`
Frame output:
- Maps `0:v:0`
- Disables audio with `-an`
- Applies `fps=<fps>,scale=w='min(<width>,iw)':h=-2:flags=bicubic:out_range=pc,format=yuvj420p`
- Encodes `mjpeg`
- Uses `-pix_fmt yuvj420p`, `-color_range pc`, `-q:v <quality>`
- Outputs `image2pipe` to either `pipe:1` or `pipe:3`
The explicit `yuvj420p`/full-range settings match the Docker image ffmpeg behavior. Older ffmpeg builds may still emit repeated swscaler warnings about deprecated pixel format. The logger suppresses only that known noisy warning so it cannot flood Docker logs and starve useful work.
## Relay Mode Details
Relay mode is implemented by `createRelayPlayback(session)`.
Important behavior:
- Waits until both audio HTTP response and frame WebSocket are attached.
- Starts two ffmpeg workers with `pipe:0` input, one for audio and one for frames.
- Fetches the original session URL exactly once from Node, not through `/_source/:token`.
- Writes each upstream compressed chunk to both ffmpeg stdin streams.
- Uses bounded branch queues via `createRelayInputBranch`.
- Pauses upstream reading while any branch queue exceeds half of `MAX_RELAY_BRANCH_QUEUE_BYTES`.
- Stops playback if any branch queue exceeds `MAX_RELAY_BRANCH_QUEUE_BYTES`.
2026-05-02 00:05:11 -07:00
- Backpressure accounting must include both chunks queued in JavaScript and bytes already written to ffmpeg stdin while waiting for `drain`. Otherwise fast movie sources can outrun realtime ffmpeg consumption and grow Node heap until OOM.
- When waiting for relay capacity, wait only on branches that are actually over the pause threshold. Including already-ready branches in a `Promise.race` can create an immediate-resolution spin loop.
2026-05-01 23:53:48 -07:00
Relay mode works best for sequential stream containers such as MPEG-TS/IPTV. It may be less reliable for file formats that require seeking or late metadata, such as some MP4 files.
## Cleanup Requirements
ffmpeg cleanup is important. Keep these invariants:
- If the audio client disconnects, stop the active playback for `single` and `relay`.
- If the frame WebSocket disconnects, stop the active playback for `single` and `relay`.
- In `split`, audio and frame workers are independent and each worker should stop when its own client side closes.
- Always release `_source` tokens when workers close.
- Always remove closed playbacks from the `playbacks` map.
- Use `stopProcess(child)`: SIGTERM first, then SIGKILL after the timeout.
- Do not leave relay stdin streams open when stopping.
Useful local cleanup checks:
```sh
pgrep -af "ffmpeg.*pipe:0|ffmpeg.*_source"
pgrep -af "node server/index.js"
```
Both should be empty after smoke tests stop.
## Logging
Operational logs are intended to be useful in Docker logs:
- ffmpeg process start/exit, PID, mode label, exit code, signal, duration.
- ffmpeg stderr lines except known swscaler pixel-format spam.
- source proxy connected/closed status, bytes, upstream end state.
- relay source connected/closed status, bytes, upstream end state.
- playback close summaries with frame counts, skipped frames, queue peaks.
Keep secrets redacted. `redactSecrets` currently redacts common query parameters such as `api_key`, `apikey`, `access_token`, `token`, and `key`.
## Environment Knobs
Runtime:
- `PORT`: HTTP port, default `3000`.
- `FFMPEG_PATH`: ffmpeg binary path, default `ffmpeg`.
- `FFMPEG_LOG_LEVEL`: ffmpeg log level, default `warning`.
- `FFMPEG_INPUT_SEEKABLE`: HTTP input seekable option, default `0`.
- `PLAYBACK_CONNECTION_MODE`: `split`, `relay`, or `single`.
- `RECENT_URLS_PATH`: recent URL JSON path.
- `RECENT_URL_LIMIT`: recent URL count, default `12`.
- `MAX_WS_BUFFER_BYTES`: server-side WebSocket JPEG frame backlog cap, default `2097152`.
2026-05-01 23:53:48 -07:00
- `MAX_AUDIO_QUEUE_BYTES`: single-mode audio output queue cap, default `16777216`.
- `MAX_RELAY_BRANCH_QUEUE_BYTES`: relay per-branch compressed-input queue cap, default `16777216`.
Session playback options are accepted by `POST /api/session` even though the UI hides them:
- `fps`: default `24`, clamped `1..30`.
- `width`: default `960`, clamped `160..1920`.
- `quality`: default `5`, clamped `2..18`; lower is better for ffmpeg `-q:v`.
- `audioBitrate`: default `160k`, accepts two or three digits followed by `k`.
## Docker Notes
The Docker image installs ffmpeg and runs as non-root `node`.
Hardware acceleration is not required. Device passthrough may help only if server CPU decode is saturated. It does not fix audio/frame coupling issues; `relay` was built for that.
Compose includes commented examples for:
- VAAPI passthrough through `/dev/dri`.
- NVIDIA passthrough with `gpus: all`.
## Verification Commands
Basic validation:
```sh
node --check server/index.js
docker compose -f docker-compose-example.yml config
docker build -t frame-stream-player .
```
MPEG-TS local stream generation for playback smoke tests:
```sh
ffmpeg -y -hide_banner -loglevel error \
-f lavfi -i testsrc2=size=320x180:rate=24 \
-f lavfi -i sine=frequency=440:sample_rate=48000 \
-t 8 \
-c:v mpeg2video -pix_fmt yuv420p -b:v 900k \
-c:a mp2 -b:a 128k \
-f mpegts public/_relay-smoke.ts
```
Start a mode locally:
```sh
PORT=3014 RECENT_URLS_PATH=/tmp/carplay-relay-recent.json \
FFMPEG_LOG_LEVEL=warning FFMPEG_INPUT_SEEKABLE=0 \
PLAYBACK_CONNECTION_MODE=relay npm start
```
After smoke testing, remove generated assets:
```sh
rm -f public/_relay-smoke.ts /tmp/carplay-*-recent.json
```
## Security Notes
The server fetches arbitrary user-provided HTTP(S) URLs. Do not expose this app publicly without authentication and URL allowlisting or SSRF protections.
Do not log raw source URLs in normal operational logs. Use redaction for query-string secrets and prefer short internal `_source` URLs in ffmpeg args.
## Change Guidance
Before changing the pipeline, decide which bottleneck you are addressing:
- Browser image decode or network bandwidth: lower `width`, lower `fps`, or increase JPEG `quality` number.
- Server CPU decode: consider ffmpeg tuning or hardware acceleration.
- Upstream server rejects multiple connections: use `relay`.
- Audio/frame stutter in one-connection mode: avoid `single`, use `relay`.
- Docker log floods: suppress only known noisy lines, not all stderr.
Avoid large frontend feature additions unless requested. The product goal is a minimal URL screen and a fullscreen player.