All Projects → jairovadillo → Pychromeless

jairovadillo / Pychromeless

Licence: apache-2.0
Python Lambda Chrome Automation (naming pending)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pychromeless

Cdp4j
cdp4j - Chrome DevTools Protocol for Java
Stars: ✭ 232 (+5.94%)
Mutual labels:  automation, selenium, chrome, chromium
Playwright Go
Playwright for Go a browser automation library to control Chromium, Firefox and WebKit with a single API.
Stars: ✭ 272 (+24.2%)
Mutual labels:  automation, selenium, chromium
Lambdium
headless chrome + selenium webdriver in AWS Lambda using the serverless application model
Stars: ✭ 246 (+12.33%)
Mutual labels:  aws-lambda, selenium, chromium
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+2242.01%)
Mutual labels:  crawler, chrome, chromium
Awesome Java Crawler
本仓库收集整理爬虫相关资源,开发语言以Java为主
Stars: ✭ 228 (+4.11%)
Mutual labels:  crawler, selenium, chrome
Serverless Chrome
🌐 Run headless Chrome/Chromium on AWS Lambda
Stars: ✭ 2,625 (+1098.63%)
Mutual labels:  aws-lambda, chrome, chromium
Playwright Sharp
.NET version of the Playwright testing and automation library.
Stars: ✭ 459 (+109.59%)
Mutual labels:  automation, chrome, chromium
Ferrum
Headless Chrome Ruby API
Stars: ✭ 1,009 (+360.73%)
Mutual labels:  automation, chrome, chromium
Edge Selenium Tools
An updated EdgeDriver implementation for Selenium 3 with newly-added support for Microsoft Edge (Chromium).
Stars: ✭ 41 (-81.28%)
Mutual labels:  automation, selenium, chromium
Instagram Profilecrawl
📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.
Stars: ✭ 816 (+272.6%)
Mutual labels:  automation, crawler, selenium
Undetected Chromedriver
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Stars: ✭ 365 (+66.67%)
Mutual labels:  automation, selenium, chrome
Instagram Profilecrawl
💻 Quickly crawl the information (e.g. followers, tags, etc...) of an instagram profile. No login required!
Stars: ✭ 110 (-49.77%)
Mutual labels:  automation, crawler, selenium
Infospider
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。
Stars: ✭ 5,984 (+2632.42%)
Mutual labels:  automation, selenium, chrome
Sillynium
Automate the creation of Python Selenium Scripts by drawing coloured boxes on webpage elements
Stars: ✭ 100 (-54.34%)
Mutual labels:  automation, selenium, chrome
Instagram Bot
An Instagram bot developed using the Selenium Framework
Stars: ✭ 138 (-36.99%)
Mutual labels:  automation, crawler, selenium
Phantomas
Headless Chromium-based web performance metrics collector and monitoring tool
Stars: ✭ 2,191 (+900.46%)
Mutual labels:  automation, chromium
Api
API that uncovers the technologies used on websites and generates thumbnail from screenshot of website
Stars: ✭ 189 (-13.7%)
Mutual labels:  automation, chrome
Thirtyfour
Selenium WebDriver client for Rust, for automated testing of websites
Stars: ✭ 191 (-12.79%)
Mutual labels:  automation, selenium
Chrome Aws Lambda Layer
43 MB Google Chrome to fit inside AWS Lambda Layer compressed with Brotli
Stars: ✭ 212 (-3.2%)
Mutual labels:  aws-lambda, chromium
Zhihu fun
基于 Selenium 的知乎关键词爬虫
Stars: ✭ 185 (-15.53%)
Mutual labels:  crawler, selenium

PyChromeless

Python (selenium) Lambda Chromium Automation

PyChromeless allows to automate actions to any webpage from AWS Lambda. The aim of this project is to provide the scaffolding for future robot implementations.

But... how?

All the process is explained here. Technologies used are:

Requirements

Install docker and dependencies:

Working locally

To make local development easy, you can use the included docker-compose. Have a look at the example in lambda_function.py: it looks up “21 buttons” on Google and prints the first result.

Run it with: make docker-run

Downloading files

If your goal is to use selenium to download files instead of just scraping content from web pages, then you will need to specify a download_dir when initializing the WebDriverWrapper. Your download location should be a writable Lambda directory such as /tmp. For example, the first code in lambda_handler would become

driver = WebDriverWrapper(download_location='/tmp')

This will cause file downloads to automatically download into the download_location without requiring a confirmation dialog. You might need to sleep the handler until the file is downloaded since this occurs asynchronously.

In order to download a file from a link that opens in a new tab (i.e. target='_blank') you will need to call enable_download_in_headless_chrome in your scraping script after navigating to the desired page, but before clicking to download. This will replace all target='_blank' with target='_self'. For example:

# Navigate to download page
driver._driver.find_element_by_xpath('//a[@href="/downloads/"]').click()
# Enable headless chrome file download
driver.enable_download_in_headless_chrome()
# Click the download link
driver._driver.find_element_by_class_name("btn").click()

Building and uploading the distributable package

Everything is summarized into a simple Makefile so use:

  • make build-lambda-package
  • Upload the build.zip resulting file to your AWS Lambda function
  • Set Lambda environment variables (same values as in docker-compose.yml)
    • PYTHONPATH=/var/task/src:/var/task/lib
    • PATH=/var/task/bin
  • Adjust lambda function parameters to match your necessities, for the given example:
    • Timeout: +10 seconds
    • Memory: + 250MB

Shouts to

Contributors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].