Files
carplay/AGENTS.md
2026-05-01 23:53:48 -07:00

11 KiB

AGENTS.md

This project is a web video player for clients that can decode audio and still images, but cannot use browser video decoding. Preserve that constraint when iterating: the browser should not use <video> playback for the source stream.

Product Shape

The UI intentionally has only two screens:

  • URL entry screen with a stream URL input, a Next button, and globally stored recently played URLs.
  • Fullscreen player screen with JPEG frames drawn to a canvas and native audio playback through an <audio> element.
  • Playback controls are overlay controls toggled by tapping/clicking the frame area, similar to YouTube.
  • Do not reintroduce debug panels, frame counters, settings forms, explanatory marketing copy, or visible ffmpeg details into the normal UI.

The backend stores recently played URLs globally, not per-browser. The default path is data/recent-urls.json, configurable with RECENT_URLS_PATH. Docker Compose persists this through the frame-stream-data volume.

Core Architecture

The app is plain Node/Express plus browser JavaScript:

  • server/index.js: API, WebSocket, source proxy/relay, ffmpeg process lifecycle, recent URL persistence.
  • public/index.html: frontend markup.
  • public/app.js: URL submission, WebSocket frame receiving, audio element coordination, canvas drawing, overlay controls.
  • public/styles.css: two-screen player UI.
  • Dockerfile: production image with Node and ffmpeg.
  • docker-compose-example.yml: operational example and default env knobs.

Main public endpoints:

  • POST /api/session: validates the stream URL, stores recent URL, creates a short-lived playback session.
  • GET /api/recent-urls: returns global recent URL entries with url, redacted displayUrl, and lastPlayedAt.
  • GET /audio/:sessionId: serves MP3 audio to the browser audio element.
  • WS /frames/:sessionId: sends timed JPEG frame packets to the browser.
  • GET /api/health: exposes basic health and active playback connection mode.

Internal endpoint:

  • /_source/:token: short-lived local proxy used by ffmpeg in split and single modes. This keeps original source URLs and query tokens out of ffmpeg process args and lets the server log upstream open/close behavior.

Browser Playback Model

Audio is the playback clock. The server sends JPEG frames over WebSocket. Each binary frame packet is:

  • First 8 bytes: little-endian float64 timestamp in seconds.
  • Remaining bytes: one complete JPEG image.

The frontend decodes JPEGs with browser image APIs, queues frames, and paints frames whose timestamps are due relative to audio.currentTime. This means the browser decodes only audio and still images, not video.

Why JPEG Frames

Keep JPEG unless there is a measured reason to change it.

  • PNG is usually too large for 24 fps video.
  • GIF has poor quality, weak timing control, and awkward streaming behavior.
  • JPEG is browser-native, streamable frame-by-frame, much smaller than PNG, and simple to parse with SOI/EOI markers.

The server currently emits MJPEG through ffmpeg image2pipe. Frame parsing is done by scanning for JPEG SOI 0xff 0xd8 and EOI 0xff 0xd9.

Playback Modes

Playback mode is selected by PLAYBACK_CONNECTION_MODE. The code also accepts legacy aliases through PLAYBACK_MODE.

Use these modes deliberately:

  • split: Smoothest mode. Starts separate ffmpeg workers and separate upstream source connections for audio and frames. Use for normal files and servers that allow multiple active connections.
  • relay: IPTV-oriented mode. Opens one upstream HTTP connection from Node, then tees compressed input bytes into separate audio and frame ffmpeg workers through stdin. This preserves one source connection while isolating audio and frame processing.
  • single: Fallback mode. Opens one upstream connection and one ffmpeg process with both MP3 and MJPEG outputs. This avoids multiple source connections but can stutter because audio and frame outputs are coupled inside one ffmpeg process.

The regression history matters:

  • split was smooth, but some IPTV servers ended streams early when ffmpeg opened multiple connections/ranges.
  • single fixed the one-active-connection issue, but introduced stutter because audio output backpressure and frame generation shared one ffmpeg process.
  • relay was added to combine one upstream connection with separate ffmpeg workers.

Default code behavior is split when no mode is set. The Compose example uses relay because it is the mode to try for IPTV streams.

ffmpeg Pipelines

All ffmpeg command builders live near the bottom of server/index.js.

Common HTTP input args:

  • -hide_banner
  • -nostdin
  • -loglevel ${FFMPEG_LOG_LEVEL}
  • -nostats
  • -seekable ${FFMPEG_INPUT_SEEKABLE}
  • -re
  • -i <inputUrl>

Pipe input args for relay intentionally skip -seekable. Some ffmpeg builds reject -seekable on pipe:0 with Option seekable not found.

Audio output:

  • Maps 0:a:0?
  • Disables video with -vn
  • Converts to stereo 48 kHz MP3 with libmp3lame
  • Uses session.options.audioBitrate, default 160k
  • Outputs to pipe:1

Frame output:

  • Maps 0:v:0
  • Disables audio with -an
  • Applies fps=<fps>,scale=w='min(<width>,iw)':h=-2:flags=bicubic:out_range=pc,format=yuvj420p
  • Encodes mjpeg
  • Uses -pix_fmt yuvj420p, -color_range pc, -q:v <quality>
  • Outputs image2pipe to either pipe:1 or pipe:3

The explicit yuvj420p/full-range settings match the Docker image ffmpeg behavior. Older ffmpeg builds may still emit repeated swscaler warnings about deprecated pixel format. The logger suppresses only that known noisy warning so it cannot flood Docker logs and starve useful work.

Relay Mode Details

Relay mode is implemented by createRelayPlayback(session).

Important behavior:

  • Waits until both audio HTTP response and frame WebSocket are attached.
  • Starts two ffmpeg workers with pipe:0 input, one for audio and one for frames.
  • Fetches the original session URL exactly once from Node, not through /_source/:token.
  • Writes each upstream compressed chunk to both ffmpeg stdin streams.
  • Uses bounded branch queues via createRelayInputBranch.
  • Pauses upstream reading while any branch queue exceeds half of MAX_RELAY_BRANCH_QUEUE_BYTES.
  • Stops playback if any branch queue exceeds MAX_RELAY_BRANCH_QUEUE_BYTES.

Relay mode works best for sequential stream containers such as MPEG-TS/IPTV. It may be less reliable for file formats that require seeking or late metadata, such as some MP4 files.

Cleanup Requirements

ffmpeg cleanup is important. Keep these invariants:

  • If the audio client disconnects, stop the active playback for single and relay.
  • If the frame WebSocket disconnects, stop the active playback for single and relay.
  • In split, audio and frame workers are independent and each worker should stop when its own client side closes.
  • Always release _source tokens when workers close.
  • Always remove closed playbacks from the playbacks map.
  • Use stopProcess(child): SIGTERM first, then SIGKILL after the timeout.
  • Do not leave relay stdin streams open when stopping.

Useful local cleanup checks:

pgrep -af "ffmpeg.*pipe:0|ffmpeg.*_source"
pgrep -af "node server/index.js"

Both should be empty after smoke tests stop.

Logging

Operational logs are intended to be useful in Docker logs:

  • ffmpeg process start/exit, PID, mode label, exit code, signal, duration.
  • ffmpeg stderr lines except known swscaler pixel-format spam.
  • source proxy connected/closed status, bytes, upstream end state.
  • relay source connected/closed status, bytes, upstream end state.
  • playback close summaries with frame counts, skipped frames, queue peaks.

Keep secrets redacted. redactSecrets currently redacts common query parameters such as api_key, apikey, access_token, token, and key.

Environment Knobs

Runtime:

  • PORT: HTTP port, default 3000.
  • FFMPEG_PATH: ffmpeg binary path, default ffmpeg.
  • FFMPEG_LOG_LEVEL: ffmpeg log level, default warning.
  • FFMPEG_INPUT_SEEKABLE: HTTP input seekable option, default 0.
  • PLAYBACK_CONNECTION_MODE: split, relay, or single.
  • RECENT_URLS_PATH: recent URL JSON path.
  • RECENT_URL_LIMIT: recent URL count, default 12.
  • MAX_AUDIO_QUEUE_BYTES: single-mode audio output queue cap, default 16777216.
  • MAX_RELAY_BRANCH_QUEUE_BYTES: relay per-branch compressed-input queue cap, default 16777216.

Session playback options are accepted by POST /api/session even though the UI hides them:

  • fps: default 24, clamped 1..30.
  • width: default 960, clamped 160..1920.
  • quality: default 5, clamped 2..18; lower is better for ffmpeg -q:v.
  • audioBitrate: default 160k, accepts two or three digits followed by k.

Docker Notes

The Docker image installs ffmpeg and runs as non-root node.

Hardware acceleration is not required. Device passthrough may help only if server CPU decode is saturated. It does not fix audio/frame coupling issues; relay was built for that.

Compose includes commented examples for:

  • VAAPI passthrough through /dev/dri.
  • NVIDIA passthrough with gpus: all.

Verification Commands

Basic validation:

node --check server/index.js
docker compose -f docker-compose-example.yml config
docker build -t frame-stream-player .

MPEG-TS local stream generation for playback smoke tests:

ffmpeg -y -hide_banner -loglevel error \
  -f lavfi -i testsrc2=size=320x180:rate=24 \
  -f lavfi -i sine=frequency=440:sample_rate=48000 \
  -t 8 \
  -c:v mpeg2video -pix_fmt yuv420p -b:v 900k \
  -c:a mp2 -b:a 128k \
  -f mpegts public/_relay-smoke.ts

Start a mode locally:

PORT=3014 RECENT_URLS_PATH=/tmp/carplay-relay-recent.json \
  FFMPEG_LOG_LEVEL=warning FFMPEG_INPUT_SEEKABLE=0 \
  PLAYBACK_CONNECTION_MODE=relay npm start

After smoke testing, remove generated assets:

rm -f public/_relay-smoke.ts /tmp/carplay-*-recent.json

Security Notes

The server fetches arbitrary user-provided HTTP(S) URLs. Do not expose this app publicly without authentication and URL allowlisting or SSRF protections.

Do not log raw source URLs in normal operational logs. Use redaction for query-string secrets and prefer short internal _source URLs in ffmpeg args.

Change Guidance

Before changing the pipeline, decide which bottleneck you are addressing:

  • Browser image decode or network bandwidth: lower width, lower fps, or increase JPEG quality number.
  • Server CPU decode: consider ffmpeg tuning or hardware acceleration.
  • Upstream server rejects multiple connections: use relay.
  • Audio/frame stutter in one-connection mode: avoid single, use relay.
  • Docker log floods: suppress only known noisy lines, not all stderr.

Avoid large frontend feature additions unless requested. The product goal is a minimal URL screen and a fullscreen player.