All Projects → vifreefly → rubium

vifreefly / rubium

Licence: MIT license
Rubium is a lightweight alternative to Selenium/Capybara/Watir if you need to perform some operations (like web scraping) using Headless Chromium and Ruby

Programming Languages

ruby
36898 projects - #4 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to rubium

node-headless-chrome
⚠️ 🚧 Install precompiled versions of the Chromium/Chrome headless shell using npm or yarn
Stars: ✭ 20 (-69.23%)
Mutual labels:  headless, chromium
chrome-headless-launcher
Run the latest Chrome browser on CLI without head
Stars: ✭ 39 (-40%)
Mutual labels:  headless, chromium
Phpchrometopdf
A slim PHP wrapper around google-chrome to convert url to pdf or to take screenshots , easy to use and clean OOP interface
Stars: ✭ 127 (+95.38%)
Mutual labels:  headless, chromium
Crawlergo
A powerful dynamic crawler for web vulnerability scanners
Stars: ✭ 1,088 (+1573.85%)
Mutual labels:  headless, chromium
LInkedIn-Reverese-Lookup
🔎Search LinkedIn profile by email address📧
Stars: ✭ 20 (-69.23%)
Mutual labels:  scraping, chromium
Wendigo
A proper monster for front-end automated testing
Stars: ✭ 121 (+86.15%)
Mutual labels:  headless, chromium
Pdf Bot
🤖 A Node queue API for generating PDFs using headless Chrome. Comes with a CLI, S3 storage and webhooks for notifying subscribers about generated PDFs
Stars: ✭ 2,551 (+3824.62%)
Mutual labels:  headless, chromium
Html Pdf Chrome
HTML to PDF converter via Chrome/Chromium
Stars: ✭ 629 (+867.69%)
Mutual labels:  headless, chromium
docker-selenium-lambda
The simplest demo of chrome automation by python and selenium in AWS Lambda
Stars: ✭ 172 (+164.62%)
Mutual labels:  scraping, chromium
pythonista-chromeless
Serverless selenium which dynamically execute any given code.
Stars: ✭ 31 (-52.31%)
Mutual labels:  headless, scraping
Ferrum
Headless Chrome Ruby API
Stars: ✭ 1,009 (+1452.31%)
Mutual labels:  headless, chromium
ubuntu-vnc-xfce-g3
Headless Ubuntu/Xfce containers with VNC/noVNC (Generation 3)
Stars: ✭ 83 (+27.69%)
Mutual labels:  headless, chromium
Axegrinder
Crawl websites for accessibility issues from the command line.
Stars: ✭ 12 (-81.54%)
Mutual labels:  headless, chromium
Chromium Headless Remote
🐳 Dockerized Chromium in headless remote debugging mode
Stars: ✭ 122 (+87.69%)
Mutual labels:  headless, chromium
Cuprite
Headless Chrome/Chromium driver for Capybara
Stars: ✭ 743 (+1043.08%)
Mutual labels:  headless, chromium
Google Meet Scheduler
😴 Attends classes for you.
Stars: ✭ 150 (+130.77%)
Mutual labels:  headless, chromium
Puppetron
Puppeteer (Headless Chrome Node API)-based rendering solution.
Stars: ✭ 429 (+560%)
Mutual labels:  headless, chromium
Dataflowkit
Extract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+601.54%)
Mutual labels:  headless, scraping
capybara-chrome
Chrome driver for Capybara using Chrome's remote debugging protocol
Stars: ✭ 27 (-58.46%)
Mutual labels:  headless, capybara
headless-chrome-alpine
A Docker container running headless Chrome
Stars: ✭ 26 (-60%)
Mutual labels:  headless, chromium

Rubium

Project is archived

Please consider to use Ferrum - high-level API to control Chrome in Ruby instead.

Description

Rubium updated to 0.2.0 version! Added new options like set_cookies, restart_after, urls_blacklist, disable_images and others. Check the readme below:

Rubium is a handy wrapper around chrome_remote gem. It adds browsers instances handling, and some Capybara-like methods. It is very lightweight (250 lines of code in the main Rubium::Browser class for now) and doens't use Selenium or Capybara. Consider Rubium as a very simple and basic implementation of Puppeteer in Ruby language.

You can use Rubium as a lightweight alternative to Selenium/Capybara/Watir if you need to perform some operations (like web scraping) using Headless Chromium and Ruby. Of course, the API currently doesn't has a lot of methods to automate browser, but it has the most frequently used and basic ones.

require 'rubium'

browser = Rubium::Browser.new
browser.visit("https://github.com/vifreefly/rubium")

# Get current page response as string:
browser.body

# Get current page response as Nokogiri object:
browser.current_response

# Click to the some element (css selector):
browser.click("some selector")

# Get current cookies:
browser.cookies

# Set cookies (Array of hashes):
browser.set_cookies([
  { name: "some_cookie_name", value: "some_cookie_value", domain: ".some-cookie-domain.com" },
  { name: "another_cookie_name", value: "another_cookie_value", domain: ".another-cookie-domain.com" }
])

# Fill in some field:
browser.fill_in("some field selector", "Some text")

# Tells if current response has provided css selector or not. You can
# provide optional `wait:` argument (in seconds) to set the max wait time for the selector:
browser.has_css?("some selector", wait: 1)

# Tells if current response has provided text or not. You can
# provide optional `wait:` argument (in seconds) to set the max wait time for the text:
browser.has_text?("some text")

# Evaluate some JS code on a new tab:
browser.evaluate_on_new_document(File.read "browser_inject.js")

# Evaluate JS code expression:
browser.execute_script("JS code string")

# Access chrome_remote client (instance of ChromeRemote class) directly:
# See more here: https://github.com/cavalle/chrome_remote#using-the-chromeremote-api
browser.client

# Close browser:
browser.close

# Restart browser:
browser.restart!

There are some options which you can provide while creating browser instance:

browser = Rubium::Browser.new(
  debugging_port: 9222,                  # custom debugging port. Default is any available port.
  headless: false,                       # Run browser in normal (not headless) mode. Default is headless.
  window_size: [1600, 900],              # Custom window size. Default is unset.
  user_agent: "Some user agent",         # Custom user-agent.
  proxy_server: "http://1.1.1.1:8080",   # Set proxy.
  extension_code: "Some JS code string", # Inject custom JS code on each page. See above `evaluate_on_new_document`
  cookies: [],                           # Set custom cookies, see above `set_cookies`
  restart_after: 25,                     # Automatically restart browser after N processed requests
  enable_logger: true,                   # Enable logger to log info about processing requests
  max_timeout: 30,                       # How long to wait (in seconds) until page will be fully loaded. Default 60 sec.
  urls_blacklist: ["*some-domain.com*"], # Skip all requests which match provided patterns (wildcard allowed).
  disable_images: true                   # Do not download images.
)

Note that for options user_agent and proxy_server you can provide lambda object instead of string:

USER_AGENTS = ["Safari", "Mozilla", "IE", "Chrome"]
PROXIES = ["http://1.1.1.1:8080", "http://2.2.2.2:8080", "http://3.3.3.3:8080"]

browser = Rubium::Browser.new(
  user_agent:   -> { USER_AGENTS.sample },
  proxy_server: -> { PROXIES.sample },
  restart_after: 25
)

What for: Chrome doesn't provide an API to change proxies on the fly (after browser has been started). It is possible to set proxy while starting Chrome instance by providing CLI argument only. On the other hand, Rubium allows you to automatically restart browser (restart_after option) after N processed requests. On each restart, if options user_agent and/or proxy_server has lambda format, then lambda will be called to fetch fresh value. Thus it's possible to rotate proxies/user-agents without any much effort.

You can provide custom Chrome binary path this way:

Rubium.configure do |config|
  config.chrome_path = "/path/to/chrome/binary"
end

Installation

Rubium tested with 2.3.0 Ruby version and up.

Rubium is in the alpha stage (and therefore will have breaking updates in the future), so it's recommended to hard-code latest gem version in your Gemfile, like: gem 'rubium', '0.2.0'.

Contribution

Sure, feel free to fork and add new functionality.

License

The gem is available as open source under the terms of the MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].