All Projects → holsee → Chroxy

holsee / Chroxy

Licence: mit
Headless Chrome as a Service

Programming Languages

elixir
2628 projects

Projects that are alternatives of or similar to Chroxy

Cuprite
Headless Chrome/Chromium driver for Capybara
Stars: ✭ 743 (+331.98%)
Mutual labels:  chrome, headless-chrome
Puppeteer Sharp Extra
Plugin framework for PuppeteerSharp
Stars: ✭ 39 (-77.33%)
Mutual labels:  chrome, headless-chrome
Url To Pdf Api
Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.
Stars: ✭ 6,544 (+3704.65%)
Mutual labels:  chrome, headless-chrome
Puppeteer Lambda Starter Kit
Starter Kit for running Headless-Chrome by Puppeteer on AWS Lambda.
Stars: ✭ 563 (+227.33%)
Mutual labels:  chrome, headless-chrome
Mocha Chrome
☕️ Run Mocha tests using headless Google Chrome
Stars: ✭ 66 (-61.63%)
Mutual labels:  chrome, headless-chrome
Chromy
Chromy is a library for operating headless chrome. 🍺🍺🍺
Stars: ✭ 593 (+244.77%)
Mutual labels:  chrome, headless-chrome
Navalia
A bullet-proof, fast, and reliable headless browser API
Stars: ✭ 950 (+452.33%)
Mutual labels:  chrome, headless-chrome
Nightmare
A high-level browser automation library.
Stars: ✭ 19,067 (+10985.47%)
Mutual labels:  chrome, headless-chrome
Puppeteer Deep
Puppeteer, Headless Chrome;爬取《es6标准入门》、自动推文到掘金、站点性能分析;高级爬虫、自动化UI测试、性能分析;
Stars: ✭ 1,033 (+500.58%)
Mutual labels:  chrome, headless-chrome
Ferrum
Headless Chrome Ruby API
Stars: ✭ 1,009 (+486.63%)
Mutual labels:  chrome, headless-chrome
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+2881.98%)
Mutual labels:  chrome, headless-chrome
Squidwarc
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (-27.33%)
Mutual labels:  chrome, headless-chrome
Wrp
Web Rendering Proxy: Use vintage, historical, legacy browsers on modern web
Stars: ✭ 503 (+192.44%)
Mutual labels:  chrome, headless-chrome
Html Pdf Chrome
HTML to PDF converter via Chrome/Chromium
Stars: ✭ 629 (+265.7%)
Mutual labels:  chrome, headless-chrome
Pychrome
A Python Package for the Google Chrome Dev Protocol [threading base]
Stars: ✭ 469 (+172.67%)
Mutual labels:  chrome, headless-chrome
Minimal Chrome On Heroku
Getting headless chrome running on heroku
Stars: ✭ 12 (-93.02%)
Mutual labels:  chrome, headless-chrome
Serverless Chrome
🌐 Run headless Chrome/Chromium on AWS Lambda
Stars: ✭ 2,625 (+1426.16%)
Mutual labels:  chrome, headless-chrome
Chrome Headless Browser Docker
Continuously building Chrome Docker image for Linux.
Stars: ✭ 323 (+87.79%)
Mutual labels:  chrome, headless-chrome
Gowitness
🔍 gowitness - a golang, web screenshot utility using Chrome Headless
Stars: ✭ 996 (+479.07%)
Mutual labels:  chrome, headless-chrome
Chrome Devtools Protocol
Chrome Devtools Protocol client for PHP
Stars: ✭ 112 (-34.88%)
Mutual labels:  chrome, headless-chrome

Chroxy Build Status

A proxy service to mediate access to Chrome that is run in headless mode, for use in high-frequency application load testing, end-user behaviour simulations and programmatic access to Chrome Devtools.

Enables automatic initialisation of the underlying chrome browser pages upon the request for a connection, as well as closing the page once the WebSocket connection is closed.

This project was born out of necessity, as we needed to orchestrate a large number of concurrent browser scenario executions, with low-level control and advanced introspection capabilities.

Versions

Elixir: 1.8+ OTP: 21.3+

See .travis.yml for complete list of supported versions.

Features

  • Direct WebSocket connections to chrome pages, speaking Chrome Remote Debug protocol.
  • Provides connections to Chrome Browser Pages via WebSocket connection.
  • Manages Chrome Browser process via Erlang processes using erlexec
    • OS Process supervision and resiliency through automatic restart on crash.
  • Uses Chrome Remote Debugging Protocol for optimal client compatibility.
  • Transparent Dynamic Proxy provides automatic resource cleanup.

Cowboy Compatibility

Cowboy is a major dependency of Phoenix, as such here is a little notice as to which versions of cowboy are hard dependencies of Chroxy. This notice will be removed at version 1.0 of Chroxy.

Cowboy 1.x <= version 0.5.1 Cowboy 2.x > version 0.6.0 Cowboy 2.8+ > version 0.7.0

Project Goals

The objective of this project is to enable connections to headless chrome instances with minimal overhead and abstractions. Unlike browser testing frameworks such as Hound and Wallaby, Chroxy aims to provide direct unfettered access to the underlying browser using the Chrome Debug protocol whilst enabling many 1000s of concurrent connections channelling these to an underlying chrome browser resource pool.

Elixir Supervision of Chrome OS Processes - Resiliency

Chroxy uses Elixir processes and OTP supervision to manage the chrome instances, as well as including a transparent proxy to facilitate automatic initialisation and termination of the underlying chrome page based on the upstream connection lifetime.

Getting Started

Get dependencies and compile:

$ mix do deps.get, compile

Run the Chroxy Server:

$ mix run --no-halt

Run with an attached session:

$ iex -S mix

Run Docker image

Note: Chrome required a bump in shared memory allocation when running within docker in order to function in a stable manner.

Exposes 1330, and 1331 (default ports for connection api and chrome proxy endpoint).

docker build . -t chroxy
docker run --shm-size 2G -p 1330:1330 -p 1331:1331 chroxy

Operation Examples:

Using Chroxy Client & ChromeRemoteInterface

Establish 100 Browser Connections:

clients = Enum.map(1..100, fn(_) ->
  ChroxyClient.page_session!(%{host: "localhost", port: 1330})
end)

Run 100 Asynchronous browser operations:

Task.async_stream(clients, fn(client) ->
  url = "https://github.com/holsee"
  {:ok, _} = ChromeRemoteInterface.RPC.Page.navigate(client, %{url: url})
end, timeout: :infinity) |> Stream.run

You can then use any Page related functionality using ChromeRemoteInterface.

Use any client that speaks Chrome Debug Protocol:

Get the address for a connection:

$ curl http://localhost:1330/api/v1/connection

ws://localhost:1331/devtools/page/2CD7F0BC05863AB665D1FB95149665AF

With this address you can establish the connection to the chrome instance (which is routed via a transparent proxy).

Configuration

The configuration is designed to be friendly for containerisation as such uses environment variables

Chroxy as a Library

def deps do
  [{:chroxy, "~> 0.3"}]
end

If using Chroxy as a dependency of another mix projects you may wish to leverage the configuration implementation of Chroxy by replication the configuration in "../deps/chroxy/config/config.exs".

Example: Create a Page Session, Registering for Event and Navigating to URL

ws_addr = Chroxy.connection()
{:ok, page} = ChromeRemoteInterface.PageSession.start_link(ws_addr)
ChromeRemoteInterface.RPC.Page.enable(page)
ChromeRemoteInterface.PageSession.subscribe(page, "Page.loadEventFired", self())
url = "https://github.com/holsee"
{:ok, _} = ChromeRemoteInterface.RPC.Page.navigate(page, %{url: url})
# Message Received by self() => {:chrome_remote_interface, "Page.loadEventFired", _}

Configuration Variables

Ports, Proxy Host and Endpoint Scheme are managed via Env Vars.

Variable Default Desc.
CHROXY_CHROME_PORT_FROM 9222 Starting port in the Chrome Browser port range
CHROXY_CHROME_PORT_TO 9223 Last port in the Chrome Browser port range
CHROXY_PROXY_HOST "127.0.0.1" Host which is substituted to route connections via proxy
CHROXY_PROXY_PORT 1331 Port which proxy listener will accept connections on
CHROXY_ENDPOINT_SCHEME :http HTTP or HTTPS
CHROXY_ENDPOINT_PORT 1330 HTTP API will register on this port
CHROXY_CHROME_SERVER_PAGE_WAIT_MS 200 Milliseconds to wait after asking chrome to create a page
CHROME_CHROME_SERVER_CRASH_DUMPS_DIR "/tmp" Directory to which chrome will write crash dumps

Components

Proxy

An intermediary TCP proxy is in place to allow for monitoring of the upstream client and downstream chrome RSP web socket connections, in order to clean up resources after connections are closed.

Chroxy.ProxyListener - Incoming Connection Management & Delegation

  • Listens for incoming connections on CHROXY_PROXY_HOST:CHROXY_PROXY_PORT.
  • Exposes accept/1 function which will accept the next upstream TCP connection and delegate the connection to a ProxyServer process along with the proxy_opts which enables the dynamic configuration of the downstream connection.

Chroxy.ProxyServer - Dynamically Configured Transparent Proxy

  • A dynamically configured transparent proxy.
  • Manages delegated connection as the upstream connection.
  • Establishes downstream connection based on proxy_opts or ProxyServer.Hook.up/2 hook modules response, at initialisation.

Chroxy.ProxyServer.Hook - Behaviour for ProxyServer hooks. Example: ChromeProxy

  • A mechanism by which a module/server can be invoked when a ProxyServer process is coming up or down.
  • Two optional callbacks can be implemented:
    • @spec up(indentifier(), proxy_opts()) :: proxy_opts()
      • provides the registered process with the option to add or change proxy options prior to downstream connection initialisation.
    • @spec down(indentifier(), proxy_state) :: :ok
      • provides the registered process with a signal that the proxy connection is about to terminate, due to either upstream or downstream connections closing.

Chrome Browser Management

Chrome is the first browser supported, and the following server processes manage the communication and lifetime of the Chrome Browsers and Tabs.

Chroxy.ChromeProxy - Implements ProxyServer.Hook for Chrome resource management

  • Exposes function connection/1 which returns the websocket connection to the browser tab, with the proxy host and port substituted in order to route the connection via the underlying ProxyServer process.
  • Registers for callbacks from the underlying ProxyServer, implementing the down/2 callback in order to clean up the Chrome resource when connections close.

Chroxy.ChromeServer - Wraps Chrome Browser OS Process

  • Process which manages execution and control of a Chrome Browser OS process.
  • Provides basic API wrapper to manage the required browser level functionality around page creation, access and closing.
  • Translates browser logging to elixir logging, with correct levels.

Chroxy.BrowserPool - Inits & Controls access to pool of browser processes

  • Exposes connection/0 function which will return a WebSocket connection to a browser tab, from a random browser process in the managed pool.

Chroxy.BrowerPool.Chrome - Chrome Process Pool

  • Manages ChromeServer process pool, responsible for spawning a browser process for each defined PORT in the port range configured.

HTTP API - Chroxy.Endpoint

GET /api/v1/connection

Returns WebSocket URI ws:// to a Chrome Browser Page which is routed via the Proxy. This is the first port of call for an external client connecting to the service.

Request:

$ curl http://localhost:1330/api/v1/connection

Response:

ws://localhost:1331/devtools/page/2CD7F0BC05863AB665D1FB95149665AF

Kubernetes

The following is an example configuration which can be used to run Chroxy on Kubernetes.

deployment.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: crawler
  namespace: default
  labels:
    app: myApp
    tier: crawler

spec:
  replicas: 1
  revisionHistoryLimit: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  selector:
    matchLabels:
      app: myApp
      tier: crawler
  template:
    metadata:
      labels:
        app: myApp
        tier: crawler
    spec:
      containers:
        - image: eu.gcr.io/..../...:latest # your consumer
          name: api
          imagePullPolicy: Always
          resources:
            requests:
              cpu: 30m
              memory: 100Mi
          ports:
            - containerPort: 4000
          env:
          - name: USER_AGENT
            value: ...
          - name: INSTANCE_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name

        # [START chroxy]
        - name: headless-chrome
          image: eu.gcr.io/..../chroxy:latest # chroxy
          imagePullPolicy: Always
          resources:
            requests:
              cpu: 30m
              memory: 100Mi
          env:
            - name: CHROXY_CHROME_PORT_FROM
              value: "9222"
            - name: CHROXY_CHROME_PORT_TO
              value: "9223"
          ports:
            - containerPort: 1331
            - containerPort: 1330
        # [END chroxy]

service.yaml

apiVersion: v1
kind: Service
metadata:
  namespace: default
  name: crawler-api
  labels:
    app: myApp
    tier: crawler
spec:
  selector:
    app: myApp
    tier: crawler
  ports:
  - port: 4000
    protocol: TCP
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].