Capture website API

Capture screenshots of websites as a (host it yourself) API. This project is a wrapper around this library: https://github.com/sindresorhus/capture-website

Try it yourself (but beware that your screenshot is visible on a public website and the request may fail due to high traffic. Read further how prevent this):

curl 'https://capture-website-api.herokuapp.com/capture?url=https://twitter.com' -o screenshot.png

Installation

Docker

Run pre-built container from Docker Hub

Pull the image:

docker pull robvanderleek/capture-website-api

Start the container:

docker run -it -p 8080:8080 robvanderleek/capture-website-api

Make screenshot test request:

curl 'localhost:8080/capture?url=https://news.ycombinator.com/' -o screenshot.png

Build the docker image and run it

Clone the repo:

git clone [email protected]:robvanderleek/capture-website-api.git && cd capture-website-api

Build the image:

docker build -t cwa .

Start the container:

docker run -it -p 8080:8080 cwa

Make screenshot test request:

curl 'localhost:8080/capture?url=https://www.youtube.com' -o screenshot.png

Yarn

Run in a terminal:

Clone the repo:

git clone [email protected]:robvanderleek/capture-website-api.git && cd capture-website-api

Install dependencies:

yarn

Start the server:

yarn start

Make screenshot test request:

curl 'localhost:8080/capture?url=https://www.reddit.com' -o screenshot.png

Heroku

Deploy and run on Heroku:

Clone the repo:

git clone [email protected]:robvanderleek/capture-website-api.git && cd capture-website-api

heroku container:login

Create repository entry:

heroku create

Push container:

heroku container:push web

Release container:

heroku container:release web

Get Heroku endpoint:

CWA_URL=$(heroku info -s | grep web_url | cut -d= -f2)

Make screenshot test request:

curl "${CWA_URL}capture?url=https://www.linkedin.com" -o screenshot.png

Usage

Call the /capture endpoint and pass the site URL using the query parameters url:

$ curl 'https://capture-website-api.herokuapp.com/capture?url=http://gmail.com' -o screenshot.png

Simple as that.

Configuration

Application options

Application configuration options can be set as environment veriables or in a .env file in the root folder. There's an example .env file in the codebase: .env.example

Supported options are:

Name	Descrition	Default
TIMEOUT	Timeout in seconds for loading a web page	20
CONCURRENCY	Number of captures that run in parallel, more memory allows more captures to run in parallel	2
MAX_QUEUE_LENGTH	Requests that can't be handled directly are queued until the queue is full	6
SHOW_RESULTS	Enable web endpoint to show latest capture	false
SECRET	Secret string to prevent undesired usage on public endpoints	""

Capturing options

Most of the configuration options from the wrapped capture-website library are supported using query parameters. For example, to capture a site with a 650x350 viewport, no default background and animations disabled use:

curl 'https://capture-website-api.herokuapp.com/capture?url=http://amazon.com&width=650&height=350&scaleFactor=1&defaultBackground=false&disableAnimations=true' -o screenshot.png

See https://github.com/sindresorhus/capture-website for a full list of options.

Use plain Puppeteer

Sometimes the capture-website library has problems capturing sites. You can try to capture these sites with plain Puppeteer by supplying the query parameter plainPuppeteer=true

Environment variables

This app looks at two environment variables:

SHOW_RESULTS: if true the latest capture result can be viewed in the browser by browsing the base url (e.g.: https://capture-website-api.herokuapp.com/)
SECRET: when set all capture requests need to contain a query parameter secret whose value matches the value of this environment variable

Contributing

If you have suggestions for improvements, or want to report a bug, open an issue!

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

robvanderleek / capture-website-api

Programming Languages

Labels

Projects that are alternatives of or similar to capture-website-api