14 KiB
AGENTS.md
This project is a web video player for clients that can decode audio and still images, but cannot use browser video decoding. Preserve that constraint when iterating: the browser should not use <video> playback for the source stream.
Product Shape
The UI intentionally has only two screens:
- URL entry screen with a stream URL input, a
Playbutton, aQueuebutton, an env-gatedPlay Localbutton whenLOCAL_VIDEOSis set, globally stored recently played URLs, and globally stored favorites. - Fullscreen player screen with JPEG frames drawn to a canvas and native audio playback through an
<audio>element. - Playback controls are overlay controls toggled by tapping/clicking the frame area, similar to YouTube.
- Do not reintroduce debug panels, frame counters, settings forms, explanatory marketing copy, or visible ffmpeg details into the normal UI.
The backend stores recently played URLs and favorites globally, not per-browser. The default recent URL path is data/recent-urls.json, configurable with RECENT_URLS_PATH. The default favorites path is data/favorites.json, configurable with FAVORITES_PATH. Docker Compose persists both through the frame-stream-data volume.
Core Architecture
The app is plain Node/Express plus browser JavaScript:
server/index.js: API, WebSocket, source proxy/relay, ffmpeg process lifecycle, recent URL and favorites persistence.public/index.html: frontend markup.public/app.js: URL submission, WebSocket frame receiving, audio element coordination, canvas drawing, overlay controls.public/styles.css: two-screen player UI.Dockerfile: production image with Node and ffmpeg.docker-compose-example.yml: operational example and default env knobs.
Main public endpoints:
POST /api/session: validates the stream URL, stores recent URL, creates a short-lived playback session. Also acceptslocalPathfor files selected fromLOCAL_VIDEOS; local selections are not stored in recents.GET /api/local-videos: whenLOCAL_VIDEOSis set, returns the recursive local file picker list.GET /api/recent-urls: returns global recent URL entries withurl, redacteddisplayUrl, andlastPlayedAt.POST /api/recent-urls: validates a stream URL and stores it globally without creating a playback session.GET /api/favorites: returns global favorite entries withtitleandurl.PUT /api/favorites: replaces the global favorites list. Each favorite has a user-providedtitleand streamurl.GET /audio/:sessionId: serves MP3 audio to the browser audio element.WS /frames/:sessionId: sends timed JPEG frame packets to the browser.GET /api/health: exposes basic health and active playback connection mode.
Internal endpoint:
/_source/:token: short-lived local proxy used by ffmpeg in split and single modes. This keeps original source URLs and query tokens out of ffmpeg process args and lets the server log upstream open/close behavior.
Browser Playback Model
Audio is the playback clock. The server sends JPEG frames over WebSocket. Each binary frame packet is:
- First 8 bytes: little-endian float64 timestamp in seconds.
- Remaining bytes: one complete JPEG image.
The frontend decodes JPEGs with browser image APIs, queues frames, and paints frames whose timestamps are due relative to audio.currentTime. This means the browser decodes only audio and still images, not video.
Why JPEG Frames
Keep JPEG unless there is a measured reason to change it.
- PNG is usually too large for 24 fps video.
- GIF has poor quality, weak timing control, and awkward streaming behavior.
- JPEG is browser-native, streamable frame-by-frame, much smaller than PNG, and simple to parse with SOI/EOI markers.
The server currently emits MJPEG through ffmpeg image2pipe. Frame parsing is done by scanning for JPEG SOI 0xff 0xd8 and EOI 0xff 0xd9.
Playback Modes
Playback mode is selected by PLAYBACK_CONNECTION_MODE. The code also accepts legacy aliases through PLAYBACK_MODE.
Use these modes deliberately:
split: Smoothest mode. Starts separate ffmpeg workers and separate upstream source connections for audio and frames. Use for normal files and servers that allow multiple active connections.relay: IPTV-oriented mode. Opens one upstream HTTP connection from Node, then tees compressed input bytes into separate audio and frame ffmpeg workers through stdin. This preserves one source connection while isolating audio and frame processing.single: Fallback mode. Opens one upstream connection and one ffmpeg process with both MP3 and MJPEG outputs. This avoids multiple source connections but can stutter because audio and frame outputs are coupled inside one ffmpeg process.
The regression history matters:
splitwas smooth, but some IPTV servers ended streams early when ffmpeg opened multiple connections/ranges.singlefixed the one-active-connection issue, but introduced stutter because audio output backpressure and frame generation shared one ffmpeg process.relaywas added to combine one upstream connection with separate ffmpeg workers.
Default code behavior is split when no mode is set. The Compose example uses relay because it is the mode to try for IPTV streams.
Finite-duration sessions are treated as recorded video and seekable when metadata probing is enabled. Metadata probing is enabled by default except in relay, where it defaults off to preserve the one-upstream-connection behavior for IPTV-style streams. When the configured mode is relay and duration metadata is known, recorded sessions switch to the seek-capable split path; live/unknown-duration streams stay in relay mode.
ffmpeg Pipelines
All ffmpeg command builders live near the bottom of server/index.js.
Common HTTP input args:
-hide_banner-nostdin-loglevel ${FFMPEG_LOG_LEVEL}-nostats-seekable ${FFMPEG_INPUT_SEEKABLE}- HTTP reconnect options when
FFMPEG_HTTP_RECONNECT=1and the input is HTTP(S) -re-i <inputUrl>
Pipe input args for relay intentionally skip -seekable. Some ffmpeg builds reject -seekable on pipe:0 with Option seekable not found.
Audio output:
- Maps
0:a:0? - Disables video with
-vn - Converts to MP3 with
libmp3lame - Uses
session.options.audioBitrate, default160k - Uses
session.options.audioChannels, default2 - Uses
session.options.audioSampleRate, default48000 - Outputs to
pipe:1
Frame output:
- Maps
0:v:0 - Disables audio with
-an - Applies
fps=<fps>,scale=w='min(<width>,iw)':h=-2:flags=bicubic:out_range=pc,format=yuvj420p,realtime - Encodes
mjpeg - Uses
-pix_fmt yuvj420p,-color_range pc,-q:v <quality> - Outputs
image2pipeto eitherpipe:1orpipe:3
The explicit yuvj420p/full-range settings match the Docker image ffmpeg behavior. Older ffmpeg builds may still emit repeated swscaler warnings about deprecated pixel format. The logger suppresses only that known noisy warning so it cannot flood Docker logs and starve useful work.
Relay Mode Details
Relay mode is implemented by createRelayPlayback(session).
Important behavior:
- Waits until both audio HTTP response and frame WebSocket are attached.
- Starts two ffmpeg workers with
pipe:0input, one for audio and one for frames. - Fetches the original session URL exactly once from Node, not through
/_source/:token. - Writes each upstream compressed chunk to both ffmpeg stdin streams.
- Uses bounded branch queues via
createRelayInputBranch. - Pauses upstream reading while any branch queue exceeds half of
MAX_RELAY_BRANCH_QUEUE_BYTES. - Stops playback if any branch queue exceeds
MAX_RELAY_BRANCH_QUEUE_BYTES. - Backpressure accounting must include both chunks queued in JavaScript and bytes already written to ffmpeg stdin while waiting for
drain. Otherwise fast movie sources can outrun realtime ffmpeg consumption and grow Node heap until OOM. - When waiting for relay capacity, wait only on branches that are actually over the pause threshold. Including already-ready branches in a
Promise.racecan create an immediate-resolution spin loop.
Relay mode works best for sequential stream containers such as MPEG-TS/IPTV. It may be less reliable for file formats that require seeking or late metadata, such as some MP4 files.
Cleanup Requirements
ffmpeg cleanup is important. Keep these invariants:
- If the audio client disconnects, stop the active playback for
singleandrelay. - If the frame WebSocket disconnects, stop the active playback for
singleandrelay. - In
split, audio and frame workers are independent and each worker should stop when its own client side closes. - Always release
_sourcetokens when workers close. - Always remove closed playbacks from the
playbacksmap. - Use
stopProcess(child): SIGTERM first, then SIGKILL after the timeout. - Do not leave relay stdin streams open when stopping.
Useful local cleanup checks:
pgrep -af "ffmpeg.*pipe:0|ffmpeg.*_source"
pgrep -af "node server/index.js"
Both should be empty after smoke tests stop.
Logging
Operational logs are intended to be useful in Docker logs:
- ffmpeg process start/exit, PID, mode label, exit code, signal, duration.
- ffmpeg stderr lines except known swscaler pixel-format spam.
- source proxy connected/closed status, bytes, upstream end state.
- relay source connected/closed status, bytes, upstream end state.
- playback close summaries with frame counts, skipped frames, queue peaks.
Keep secrets redacted. redactSecrets currently redacts common query parameters such as api_key, apikey, access_token, token, and key.
Environment Knobs
Runtime:
PORT: HTTP port, default3000.FFMPEG_PATH: ffmpeg binary path, defaultffmpeg.YT_DLP_PATH: yt-dlp binary path, defaultyt-dlp.YT_DLP_FORMAT: yt-dlp format selector for YouTube URLs, defaultbest[ext=mp4][vcodec!=none][acodec!=none]/best[vcodec!=none][acodec!=none]/best.YT_DLP_TIMEOUT_MS: yt-dlp resolution timeout, default45000.FFMPEG_LOG_LEVEL: ffmpeg log level, defaultwarning.FFMPEG_INPUT_SEEKABLE: HTTP input seekable option, default0.FFMPEG_HTTP_RECONNECT: enable ffmpeg HTTP reconnect options for HTTP inputs, default1.FFMPEG_HTTP_RECONNECT_DELAY_MAX: max ffmpeg reconnect delay, default2.FFMPEG_HTTP_RECONNECT_MAX_RETRIES: max ffmpeg reconnect retries, default4, applied only when the installed ffmpeg supportsreconnect_max_retries.FFMPEG_HTTP_RECONNECT_ON_HTTP_ERROR: HTTP status list for reconnect, default5xx.METADATA_PROBE_ENABLED: probe session duration with ffprobe, default1except inrelay, where default is0.METADATA_PROBE_TIMEOUT_MS: ffprobe duration timeout, default4000.PLAYBACK_CONNECTION_MODE:split,relay, orsingle.RECENT_URLS_PATH: recent URL JSON path.RECENT_URL_LIMIT: recent URL count, default12.FAVORITES_PATH: favorites JSON path.FAVORITES_LIMIT: favorites count, default50.LOCAL_VIDEOS: optional local video directory. When set, the UI showsPlay Localand lists regular files under this directory recursively.DEFAULT_FPS: default frame rate, fallback24, clamped1..30.DEFAULT_FRAME_WIDTH: default maximum frame width, fallback960, clamped160..1920.JPEG_QUALITY: default JPEG quality, fallback7, clamped2..18; lower is better for ffmpeg-q:v.DEFAULT_AUDIO_BITRATE: default MP3 audio bitrate, fallback160k.DEFAULT_AUDIO_CHANNELS: default MP3 audio channels, fallback2, clamped1..2.DEFAULT_AUDIO_SAMPLE_RATE: default MP3 audio sample rate, fallback48000, clamped22050..48000.MAX_WS_BUFFER_BYTES: server-side WebSocket JPEG frame backlog cap, default2097152.MAX_AUDIO_QUEUE_BYTES: single-mode audio output queue cap, default4194304.MAX_RELAY_BRANCH_QUEUE_BYTES: relay per-branch compressed-input queue cap, default8388608.
Session playback options are accepted by POST /api/session even though the UI hides them:
fps: default24, clamped1..30.width: default960, clamped160..1920.quality: defaults toJPEG_QUALITY, clamped2..18; lower is better for ffmpeg-q:v.audioBitrate: default160k, accepts two or three digits followed byk.audioChannels: default2, clamped1..2.audioSampleRate: default48000, clamped22050..48000.
Docker Notes
The Docker image installs ffmpeg and yt-dlp and runs as non-root node. yt-dlp is installed from the upstream master branch when the image is built.
Hardware acceleration is not required. Device passthrough may help only if server CPU decode is saturated. It does not fix audio/frame coupling issues; relay was built for that.
Compose includes commented examples for:
- VAAPI passthrough through
/dev/dri. - NVIDIA passthrough with
gpus: all.
Verification Commands
Basic validation:
node --check server/index.js
docker compose -f docker-compose-example.yml config
docker build -t frame-stream-player .
MPEG-TS local stream generation for playback smoke tests:
ffmpeg -y -hide_banner -loglevel error \
-f lavfi -i testsrc2=size=320x180:rate=24 \
-f lavfi -i sine=frequency=440:sample_rate=48000 \
-t 8 \
-c:v mpeg2video -pix_fmt yuv420p -b:v 900k \
-c:a mp2 -b:a 128k \
-f mpegts public/_relay-smoke.ts
Start a mode locally:
PORT=3014 RECENT_URLS_PATH=/tmp/carplay-relay-recent.json \
FAVORITES_PATH=/tmp/carplay-relay-favorites.json \
FFMPEG_LOG_LEVEL=warning FFMPEG_INPUT_SEEKABLE=0 \
PLAYBACK_CONNECTION_MODE=relay npm start
After smoke testing, remove generated assets:
rm -f public/_relay-smoke.ts /tmp/carplay-*-recent.json /tmp/carplay-*-favorites.json
Security Notes
The server fetches arbitrary user-provided HTTP(S) URLs. Do not expose this app publicly without authentication and URL allowlisting or SSRF protections.
Do not log raw source URLs in normal operational logs. Use redaction for query-string secrets and prefer short internal _source URLs in ffmpeg args.
Change Guidance
Before changing the pipeline, decide which bottleneck you are addressing:
- Browser image decode or network bandwidth: lower
width, lowerfps, or increase JPEGqualitynumber. - Server CPU decode: consider ffmpeg tuning or hardware acceleration.
- Upstream server rejects multiple connections: use
relay. - Audio/frame stutter in one-connection mode: avoid
single, userelay. - Docker log floods: suppress only known noisy lines, not all stderr.
Avoid large frontend feature additions unless requested. The product goal is a minimal URL screen and a fullscreen player.