Files
sigilbox/README.md

1.5 KiB

Local Page Archiver

This project saves self-contained HTML archives. It opens the input with Playwright, captures the rendered HTML, and inlines external resources as data: URLs.

CLI

npm install
npm run install-browsers
node src/cli.mjs archive "https://example.com/article"

For an existing HTML file:

node src/cli.mjs archive ./page.html

Archives are written to ARCHIVE_PATH, or to a development directory under the system temp directory when ARCHIVE_PATH is not set.

Ephemeral container worker

The host-facing container boundary is src/container-runner.mjs. It starts a short-lived Docker/Podman worker container, mounts the host archive directory at /archives, sends one archive request, reads a JSON result, and exits.

Build the worker image:

podman build -t local-page-archiver:latest .

Archive through the worker on macOS with Podman:

node src/container-runner.mjs archive "https://example.com/article" \
  --runtime podman \
  --image local-page-archiver:latest \
  --archive-path ./archives

The convenience wrapper does the same thing and builds the image if missing:

./podman-run.sh archive "https://example.com/article"

For visual debugging, expose VNC from the worker:

./podman-run.sh vnc-archive "https://example.com/article"
# Then open vnc://localhost:5901

The worker image starts Xvfb internally, so callers do not need to mount the host X11 socket or override the entrypoint.