2.0 KiB
sigilbox
This project saves self-contained HTML archives. It opens the input with Playwright, captures the rendered HTML, and inlines external resources as data: URLs.
CLI
npm install
npm run install-browsers
node src/cli.mjs archive "https://example.com/article"
For an existing HTML file:
node src/cli.mjs archive ./page.html
Archives are written to ARCHIVE_PATH, or to a development directory under the system temp directory when ARCHIVE_PATH is not set.
Ephemeral container worker
The host-facing container boundary is src/container-runner.mjs. It starts a short-lived Docker/Podman worker container, mounts the host archive directory at /archives, sends one archive request, reads a JSON result, and exits.
Build the worker image:
podman build -t local-page-archiver:latest .
Archive through the worker on macOS with Podman:
node src/container-runner.mjs archive "https://example.com/article" \
--runtime podman \
--image local-page-archiver:latest \
--archive-path ./archives
The convenience wrapper does the same thing and builds the image if missing:
./podman-run.sh archive "https://example.com/article"
For visual debugging, expose VNC from the worker:
./podman-run.sh vnc-archive "https://example.com/article"
# Then open vnc://localhost:5901
The worker image starts Xvfb internally, so callers do not need to mount the host X11 socket or override the entrypoint.
Web UI
The web path is split into three roles:
src/frontend-server.mjsserves the static UI and proxies/api/*and/archives/*to the backend.src/backend-server.mjsmanages archive lookup, job state, and the archive index.src/worker-server.mjsruns inside the browser worker container and wrapsarchivePage()over HTTP.
Run the full stack with:
docker compose -f docker-compose.example.yml up --build
Then open http://localhost:5731. Direct path archival is supported, for example:
http://localhost:5731/https://example.com