Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰，旨在安全快捷的帮助用户拿回自己的数据，工具代码开源，流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。

Stars: ✭ 5,984 (+2632.42%)

Mutual labels: automation, selenium, chrome

Sillynium

Automate the creation of Python Selenium Scripts by drawing coloured boxes on webpage elements

Stars: ✭ 100 (-54.34%)

Mutual labels: automation, selenium, chrome

Instagram Bot

An Instagram bot developed using the Selenium Framework

Stars: ✭ 138 (-36.99%)

Mutual labels: automation, crawler, selenium

Phantomas

Headless Chromium-based web performance metrics collector and monitoring tool

Stars: ✭ 2,191 (+900.46%)

Mutual labels: automation, chromium

Api

API that uncovers the technologies used on websites and generates thumbnail from screenshot of website

Stars: ✭ 189 (-13.7%)

Mutual labels: automation, chrome

Thirtyfour

Selenium WebDriver client for Rust, for automated testing of websites

Stars: ✭ 191 (-12.79%)

Mutual labels: automation, selenium

Chrome Aws Lambda Layer

43 MB Google Chrome to fit inside AWS Lambda Layer compressed with Brotli

Stars: ✭ 212 (-3.2%)

Mutual labels: aws-lambda, chromium

Zhihu fun

基于 Selenium 的知乎关键词爬虫

Stars: ✭ 185 (-15.53%)

Mutual labels: crawler, selenium

View All Similar Projects ➔

PyChromeless

Python (selenium) Lambda Chromium Automation

PyChromeless allows to automate actions to any webpage from AWS Lambda. The aim of this project is to provide the scaffolding for future robot implementations.

But... how?

All the process is explained here. Technologies used are:

Python 3.6
Selenium
Chrome driver
Small chromium binary

Requirements

Install docker and dependencies:

make fetch-dependencies
Installing Docker
Installing Docker compose

Working locally

To make local development easy, you can use the included docker-compose. Have a look at the example in lambda_function.py: it looks up “21 buttons” on Google and prints the first result.

Run it with: make docker-run

Downloading files

If your goal is to use selenium to download files instead of just scraping content from web pages, then you will need to specify a download_dir when initializing the WebDriverWrapper. Your download location should be a writable Lambda directory such as /tmp. For example, the first code in lambda_handler would become

driver = WebDriverWrapper(download_location='/tmp')

This will cause file downloads to automatically download into the download_location without requiring a confirmation dialog. You might need to sleep the handler until the file is downloaded since this occurs asynchronously.

In order to download a file from a link that opens in a new tab (i.e. target='_blank') you will need to call enable_download_in_headless_chrome in your scraping script after navigating to the desired page, but before clicking to download. This will replace all target='_blank' with target='_self'. For example:

# Navigate to download page
driver._driver.find_element_by_xpath('//a[@href="/downloads/"]').click()
# Enable headless chrome file download
driver.enable_download_in_headless_chrome()
# Click the download link
driver._driver.find_element_by_class_name("btn").click()

Building and uploading the distributable package

Everything is summarized into a simple Makefile so use:

make build-lambda-package
Upload the build.zip resulting file to your AWS Lambda function
Set Lambda environment variables (same values as in docker-compose.yml)
- PYTHONPATH=/var/task/src:/var/task/lib
- PATH=/var/task/bin
Adjust lambda function parameters to match your necessities, for the given example:
- Timeout: +10 seconds
- Memory: + 250MB

Shouts to

Contributors

Jairo Vadillo (@jairovadillo)
Pere Giro ()
Ricard Falcó (@ricardfp)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 219

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗