- Server now has configurable MAX_WS_BUFFER_BYTES defaulting to 2097152, and skips JPEG frames
when the WebSocket is backed up instead of queueing stale frames in ws (server/index.js:30,
server/index.js:1439).
- Browser frame handling now decodes frames sequentially, drops late frames against the audio
clock, caps pending/decoded queues, and draws only the latest due frame per animation tick
(public/app.js:280, public/app.js:381).
- Relay/split normal EOF closes are no longer mislabeled as client_disconnect, which should
make logs around ffmpeg decode warnings less misleading (server/index.js:797, server/
index.js:1071).
- Documented MAX_WS_BUFFER_BYTES in README, Compose, and AGENTS.
11 KiB
AGENTS.md
This project is a web video player for clients that can decode audio and still images, but cannot use browser video decoding. Preserve that constraint when iterating: the browser should not use <video> playback for the source stream.
Product Shape
The UI intentionally has only two screens:
- URL entry screen with a stream URL input, a
Nextbutton, and globally stored recently played URLs. - Fullscreen player screen with JPEG frames drawn to a canvas and native audio playback through an
<audio>element. - Playback controls are overlay controls toggled by tapping/clicking the frame area, similar to YouTube.
- Do not reintroduce debug panels, frame counters, settings forms, explanatory marketing copy, or visible ffmpeg details into the normal UI.
The backend stores recently played URLs globally, not per-browser. The default path is data/recent-urls.json, configurable with RECENT_URLS_PATH. Docker Compose persists this through the frame-stream-data volume.
Core Architecture
The app is plain Node/Express plus browser JavaScript:
server/index.js: API, WebSocket, source proxy/relay, ffmpeg process lifecycle, recent URL persistence.public/index.html: frontend markup.public/app.js: URL submission, WebSocket frame receiving, audio element coordination, canvas drawing, overlay controls.public/styles.css: two-screen player UI.Dockerfile: production image with Node and ffmpeg.docker-compose-example.yml: operational example and default env knobs.
Main public endpoints:
POST /api/session: validates the stream URL, stores recent URL, creates a short-lived playback session.GET /api/recent-urls: returns global recent URL entries withurl, redacteddisplayUrl, andlastPlayedAt.GET /audio/:sessionId: serves MP3 audio to the browser audio element.WS /frames/:sessionId: sends timed JPEG frame packets to the browser.GET /api/health: exposes basic health and active playback connection mode.
Internal endpoint:
/_source/:token: short-lived local proxy used by ffmpeg in split and single modes. This keeps original source URLs and query tokens out of ffmpeg process args and lets the server log upstream open/close behavior.
Browser Playback Model
Audio is the playback clock. The server sends JPEG frames over WebSocket. Each binary frame packet is:
- First 8 bytes: little-endian float64 timestamp in seconds.
- Remaining bytes: one complete JPEG image.
The frontend decodes JPEGs with browser image APIs, queues frames, and paints frames whose timestamps are due relative to audio.currentTime. This means the browser decodes only audio and still images, not video.
Why JPEG Frames
Keep JPEG unless there is a measured reason to change it.
- PNG is usually too large for 24 fps video.
- GIF has poor quality, weak timing control, and awkward streaming behavior.
- JPEG is browser-native, streamable frame-by-frame, much smaller than PNG, and simple to parse with SOI/EOI markers.
The server currently emits MJPEG through ffmpeg image2pipe. Frame parsing is done by scanning for JPEG SOI 0xff 0xd8 and EOI 0xff 0xd9.
Playback Modes
Playback mode is selected by PLAYBACK_CONNECTION_MODE. The code also accepts legacy aliases through PLAYBACK_MODE.
Use these modes deliberately:
split: Smoothest mode. Starts separate ffmpeg workers and separate upstream source connections for audio and frames. Use for normal files and servers that allow multiple active connections.relay: IPTV-oriented mode. Opens one upstream HTTP connection from Node, then tees compressed input bytes into separate audio and frame ffmpeg workers through stdin. This preserves one source connection while isolating audio and frame processing.single: Fallback mode. Opens one upstream connection and one ffmpeg process with both MP3 and MJPEG outputs. This avoids multiple source connections but can stutter because audio and frame outputs are coupled inside one ffmpeg process.
The regression history matters:
splitwas smooth, but some IPTV servers ended streams early when ffmpeg opened multiple connections/ranges.singlefixed the one-active-connection issue, but introduced stutter because audio output backpressure and frame generation shared one ffmpeg process.relaywas added to combine one upstream connection with separate ffmpeg workers.
Default code behavior is split when no mode is set. The Compose example uses relay because it is the mode to try for IPTV streams.
ffmpeg Pipelines
All ffmpeg command builders live near the bottom of server/index.js.
Common HTTP input args:
-hide_banner-nostdin-loglevel ${FFMPEG_LOG_LEVEL}-nostats-seekable ${FFMPEG_INPUT_SEEKABLE}-re-i <inputUrl>
Pipe input args for relay intentionally skip -seekable. Some ffmpeg builds reject -seekable on pipe:0 with Option seekable not found.
Audio output:
- Maps
0:a:0? - Disables video with
-vn - Converts to stereo 48 kHz MP3 with
libmp3lame - Uses
session.options.audioBitrate, default160k - Outputs to
pipe:1
Frame output:
- Maps
0:v:0 - Disables audio with
-an - Applies
fps=<fps>,scale=w='min(<width>,iw)':h=-2:flags=bicubic:out_range=pc,format=yuvj420p - Encodes
mjpeg - Uses
-pix_fmt yuvj420p,-color_range pc,-q:v <quality> - Outputs
image2pipeto eitherpipe:1orpipe:3
The explicit yuvj420p/full-range settings match the Docker image ffmpeg behavior. Older ffmpeg builds may still emit repeated swscaler warnings about deprecated pixel format. The logger suppresses only that known noisy warning so it cannot flood Docker logs and starve useful work.
Relay Mode Details
Relay mode is implemented by createRelayPlayback(session).
Important behavior:
- Waits until both audio HTTP response and frame WebSocket are attached.
- Starts two ffmpeg workers with
pipe:0input, one for audio and one for frames. - Fetches the original session URL exactly once from Node, not through
/_source/:token. - Writes each upstream compressed chunk to both ffmpeg stdin streams.
- Uses bounded branch queues via
createRelayInputBranch. - Pauses upstream reading while any branch queue exceeds half of
MAX_RELAY_BRANCH_QUEUE_BYTES. - Stops playback if any branch queue exceeds
MAX_RELAY_BRANCH_QUEUE_BYTES. - Backpressure accounting must include both chunks queued in JavaScript and bytes already written to ffmpeg stdin while waiting for
drain. Otherwise fast movie sources can outrun realtime ffmpeg consumption and grow Node heap until OOM. - When waiting for relay capacity, wait only on branches that are actually over the pause threshold. Including already-ready branches in a
Promise.racecan create an immediate-resolution spin loop.
Relay mode works best for sequential stream containers such as MPEG-TS/IPTV. It may be less reliable for file formats that require seeking or late metadata, such as some MP4 files.
Cleanup Requirements
ffmpeg cleanup is important. Keep these invariants:
- If the audio client disconnects, stop the active playback for
singleandrelay. - If the frame WebSocket disconnects, stop the active playback for
singleandrelay. - In
split, audio and frame workers are independent and each worker should stop when its own client side closes. - Always release
_sourcetokens when workers close. - Always remove closed playbacks from the
playbacksmap. - Use
stopProcess(child): SIGTERM first, then SIGKILL after the timeout. - Do not leave relay stdin streams open when stopping.
Useful local cleanup checks:
pgrep -af "ffmpeg.*pipe:0|ffmpeg.*_source"
pgrep -af "node server/index.js"
Both should be empty after smoke tests stop.
Logging
Operational logs are intended to be useful in Docker logs:
- ffmpeg process start/exit, PID, mode label, exit code, signal, duration.
- ffmpeg stderr lines except known swscaler pixel-format spam.
- source proxy connected/closed status, bytes, upstream end state.
- relay source connected/closed status, bytes, upstream end state.
- playback close summaries with frame counts, skipped frames, queue peaks.
Keep secrets redacted. redactSecrets currently redacts common query parameters such as api_key, apikey, access_token, token, and key.
Environment Knobs
Runtime:
PORT: HTTP port, default3000.FFMPEG_PATH: ffmpeg binary path, defaultffmpeg.FFMPEG_LOG_LEVEL: ffmpeg log level, defaultwarning.FFMPEG_INPUT_SEEKABLE: HTTP input seekable option, default0.PLAYBACK_CONNECTION_MODE:split,relay, orsingle.RECENT_URLS_PATH: recent URL JSON path.RECENT_URL_LIMIT: recent URL count, default12.MAX_WS_BUFFER_BYTES: server-side WebSocket JPEG frame backlog cap, default2097152.MAX_AUDIO_QUEUE_BYTES: single-mode audio output queue cap, default16777216.MAX_RELAY_BRANCH_QUEUE_BYTES: relay per-branch compressed-input queue cap, default16777216.
Session playback options are accepted by POST /api/session even though the UI hides them:
fps: default24, clamped1..30.width: default960, clamped160..1920.quality: default5, clamped2..18; lower is better for ffmpeg-q:v.audioBitrate: default160k, accepts two or three digits followed byk.
Docker Notes
The Docker image installs ffmpeg and runs as non-root node.
Hardware acceleration is not required. Device passthrough may help only if server CPU decode is saturated. It does not fix audio/frame coupling issues; relay was built for that.
Compose includes commented examples for:
- VAAPI passthrough through
/dev/dri. - NVIDIA passthrough with
gpus: all.
Verification Commands
Basic validation:
node --check server/index.js
docker compose -f docker-compose-example.yml config
docker build -t frame-stream-player .
MPEG-TS local stream generation for playback smoke tests:
ffmpeg -y -hide_banner -loglevel error \
-f lavfi -i testsrc2=size=320x180:rate=24 \
-f lavfi -i sine=frequency=440:sample_rate=48000 \
-t 8 \
-c:v mpeg2video -pix_fmt yuv420p -b:v 900k \
-c:a mp2 -b:a 128k \
-f mpegts public/_relay-smoke.ts
Start a mode locally:
PORT=3014 RECENT_URLS_PATH=/tmp/carplay-relay-recent.json \
FFMPEG_LOG_LEVEL=warning FFMPEG_INPUT_SEEKABLE=0 \
PLAYBACK_CONNECTION_MODE=relay npm start
After smoke testing, remove generated assets:
rm -f public/_relay-smoke.ts /tmp/carplay-*-recent.json
Security Notes
The server fetches arbitrary user-provided HTTP(S) URLs. Do not expose this app publicly without authentication and URL allowlisting or SSRF protections.
Do not log raw source URLs in normal operational logs. Use redaction for query-string secrets and prefer short internal _source URLs in ffmpeg args.
Change Guidance
Before changing the pipeline, decide which bottleneck you are addressing:
- Browser image decode or network bandwidth: lower
width, lowerfps, or increase JPEGqualitynumber. - Server CPU decode: consider ffmpeg tuning or hardware acceleration.
- Upstream server rejects multiple connections: use
relay. - Audio/frame stutter in one-connection mode: avoid
single, userelay. - Docker log floods: suppress only known noisy lines, not all stderr.
Avoid large frontend feature additions unless requested. The product goal is a minimal URL screen and a fullscreen player.