All Projects → TeamHG-Memex → Aquarium

TeamHG-Memex / Aquarium

Licence: mit
Splash + HAProxy + Docker Compose

Programming Languages

python
139335 projects - #7 most used programming language

Aquarium

Aquarium is a cookiecuter_ template for hassle-free Docker Compose_ + Splash_ setup. Think of it as a Splash instance with extra features and without common pitfalls.

.. _cookiecuter: http://cookiecutter.rtfd.org .. _Splash: https://github.com/scrapinghub/splash .. _Docker Compose: https://docs.docker.com/compose/

Usage

First, make sure Docker and Docker Compose are installed.

Then install cookiecutter::

pip install cookiecutter

or (on OS X + homebrew)::

brew install cookiecutter

Then generate a folder with config files::

cookiecutter gh:TeamHG-Memex/aquarium

With all default options it'll create an aquarium folder in the current path. Go to this folder and start the Splash cluster::

cd ./aquarium
docker-compose up

Then use http://:8050 as a regular Splash_ instance. On Linux http://0.0.0.0:8050 should work; on OS X and Windows IP address depends on boot2docker or docker-machine.

Options

When generating a config, cookiecutter will ask a bunch of questions.

  • folder_name (default is "aquarium") - a name of the target folder.

  • num_splashes (default is "3") - a number of Splash instances to create. To utilize full server capacity it makes sense to create slightly more Splash instances than CPU cores - e.g. on 2-core machine 3 instances often work best.

  • splash_version (default is "3.0") - a version of scrapighub/splash Docker image.

  • auth_user (default is "user"), auth_password (default is "userpass")

    • HTTP Basic Auth credentials for Splash.
  • splash_verbosity (default is "1") - Splash log verbosity, from 0 to 5.

  • max_timeout (default is "3600") - maximum allowed timeout.

  • maxrss_mb (default is "3000") - a soft memory limit, in MB. Splash container will be restarted after some time if it starts to use more memory then this value.

  • splash_slots (default is 5) - a number of Splash slots to use, i.e. how many render jobs to run in parallel in a single Splash process.

  • stats_enabled (default is "1") - whether to enable HAProxy stats. If stats are enabled visit http://:8036 to see stats page.

  • stats_auth (default is "admin:adminpass") - HTTP Basic Auth credentials for HAProxy stats.

  • tor (default is "1") - enter 0 to disable Tor_ support. When Tor support is enabled, all .onion links are opened using Tor. In addition to that, there is tor Splash proxy profile_ which you can use to render any page using Tor.

  • adblock (default is "1") - Enter 0 to disable AdBlock Plus request filters_ (FIXME: this option is not working yet; filters are always available). By default, the following filters are available:

    • easylist: default set of EasyList_ filters for English;
    • easyprivacy: EasyPrivacy filters remove tracking scripts;
    • easylist_noadult: EasyList variant without filters for adult domains;
    • fanboy-social: removes social media content such as the Facebook like buttons and other widgets.
    • fanboy-annoyance: blocks Social Media content, in-page pop-ups and other annoyances; use it to decrease loading times and uncluttering pages. fanboy-social is already included in this filter.

.. _Tor: http://torproject.org .. _Splash proxy profile: http://splash.readthedocs.org/en/latest/api.html#proxy-profiles .. _request filters: http://splash.readthedocs.org/en/latest/api.html#request-filters .. _EasyList: https://easylist.to/

Contributing

License is MIT.


.. image:: https://hyperiongray.s3.amazonaws.com/define-hg.svg :target: https://www.hyperiongray.com/?pk_campaign=github&pk_kwd=aquarium :alt: define hyperiongray

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].