simplify
This commit is contained in:
34
README.md
34
README.md
@@ -1,8 +1,6 @@
|
||||
# Local Page Archiver
|
||||
|
||||
This project saves self-contained HTML archives for pages the operator is authorized to access. It sends a real browser user agent, renders web URLs with Playwright, strips ad/tracker-like elements, normalizes the captured DOM, and inlines page requisites as `data:` URLs.
|
||||
|
||||
It intentionally does not execute paywall-bypass rules. The bundled `bypass-paywalls-clean-filters` files are treated as reference material only; paywall selectors and scripts are not applied.
|
||||
This project saves self-contained HTML archives. It opens the input with Playwright, captures the rendered HTML, and inlines external resources as `data:` URLs.
|
||||
|
||||
## CLI
|
||||
|
||||
@@ -15,35 +13,7 @@ node src/cli.mjs archive "https://example.com/article"
|
||||
For an existing HTML file:
|
||||
|
||||
```sh
|
||||
node src/cli.mjs archive ./page.html --static
|
||||
node src/cli.mjs archive ./page.html
|
||||
```
|
||||
|
||||
For an `archive.ph` HTML export where you want the captured page without the archive shell:
|
||||
|
||||
```sh
|
||||
node src/cli.mjs archive ./bloomberg-archive.html --static --strip-archive-shell
|
||||
```
|
||||
|
||||
Local `archive.ph` HTML inputs with `--strip-archive-shell` use the static extractor by default because those files already contain the rendered page. Add `--render` only when you explicitly want Chromium to load the local HTML first.
|
||||
|
||||
Computed-style freezing is off by default for live web pages because it can inflate modern article pages into very large HTML files. Add `--freeze-styles` only when stylesheet inlining is not enough to preserve layout.
|
||||
|
||||
Archives are written to `ARCHIVE_PATH`, or to a development directory under the system temp directory when `ARCHIVE_PATH` is not set.
|
||||
|
||||
## API
|
||||
|
||||
```sh
|
||||
ARCHIVE_PATH=/tmp/local-page-archives npm run serve
|
||||
```
|
||||
|
||||
Archive a page:
|
||||
|
||||
```sh
|
||||
curl -X POST http://127.0.0.1:8787/archive \
|
||||
-H 'content-type: application/json' \
|
||||
-d '{"url":"https://example.com/article"}'
|
||||
```
|
||||
|
||||
The response includes the archived file path and a local `viewUrl`.
|
||||
|
||||
Set `PORT` to choose a port other than the default `8787`.
|
||||
|
||||
Reference in New Issue
Block a user