Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Stars: ✭ 3,154 (-38.51%)

Mutual labels: scraping, crawling, puppeteer, headless-chrome

Awesome Puppeteer

A curated list of awesome puppeteer resources.

Stars: ✭ 1,728 (-66.31%)

Mutual labels: scraping, crawling, puppeteer, headless-chrome

Colly

Elegant Scraper and Crawler Framework for Golang

Stars: ✭ 15,535 (+202.89%)

Mutual labels: crawler, scraper, scraping, crawling

Phantomas

Headless Chromium-based web performance metrics collector and monitoring tool

Stars: ✭ 2,191 (-57.28%)

Mutual labels: puppeteer, chromium, headless-chrome, jquery

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

Stars: ✭ 789 (-84.62%)

Mutual labels: crawler, scraper, scraping, crawling

Crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

Stars: ✭ 440 (-91.42%)

Mutual labels: crawler, scraper, scraping, crawling

Puppeteer Lambda Starter Kit

Starter Kit for running Headless-Chrome by Puppeteer on AWS Lambda.

Stars: ✭ 563 (-89.02%)

Mutual labels: chrome, puppeteer, headless-chrome

Cuprite

Headless Chrome/Chromium driver for Capybara

Stars: ✭ 743 (-85.51%)

Mutual labels: chrome, chromium, headless-chrome

Puppeteer Deep

Puppeteer, Headless Chrome；爬取《es6标准入门》、自动推文到掘金、站点性能分析；高级爬虫、自动化UI测试、性能分析；

Stars: ✭ 1,033 (-79.86%)

Mutual labels: chrome, puppeteer, headless-chrome

Url To Pdf Api

Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.

Stars: ✭ 6,544 (+27.59%)

Mutual labels: chrome, puppeteer, headless-chrome

Puppeteer Docs Zh Cn

Google Puppeteer 文档的中文版本 , 目标版本 1.9.0, 翻译中...

Stars: ✭ 61 (-98.81%)

Mutual labels: chrome, puppeteer, chromium

Sasila

一个灵活、友好的爬虫框架

Stars: ✭ 286 (-94.42%)

Mutual labels: crawler, scraping, crawling

Flaresolverr

Proxy server to bypass Cloudflare protection

Stars: ✭ 241 (-95.3%)

Mutual labels: chrome, puppeteer, chromium

Cdp4j

cdp4j - Chrome DevTools Protocol for Java

Stars: ✭ 232 (-95.48%)

Mutual labels: crawling, chrome, chromium

Secret Agent

The web browser that's built for scraping.

Stars: ✭ 151 (-97.06%)

Mutual labels: scraping, puppeteer, chromium

Autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Stars: ✭ 4,077 (-20.51%)

Mutual labels: crawler, scraper, scraping

Phpchrometopdf

A slim PHP wrapper around google-chrome to convert url to pdf or to take screenshots , easy to use and clean OOP interface

Stars: ✭ 127 (-97.52%)

Mutual labels: chrome, chromium, headless-chrome

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (-88.63%)

Mutual labels: crawler, scraping, crawling

Crawlergo

A powerful dynamic crawler for web vulnerability scanners

Stars: ✭ 1,088 (-78.79%)

Mutual labels: crawler, chromium, headless-chrome

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (-75.71%)

Mutual labels: crawler, scraper, scraping

Newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Stars: ✭ 11,545 (+125.09%)

Mutual labels: crawler, scraper, crawling

Antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Stars: ✭ 198 (-96.14%)

Mutual labels: crawler, scraping, crawling

Html Pdf Chrome

HTML to PDF converter via Chrome/Chromium

Stars: ✭ 629 (-87.74%)

Mutual labels: chrome, chromium, headless-chrome

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

Stars: ✭ 53 (-98.97%)

Mutual labels: scraper, scraping, crawling

Ferrum

Headless Chrome Ruby API

Stars: ✭ 1,009 (-80.33%)

Mutual labels: chrome, chromium, headless-chrome

Puppeteer Sharp Extra

Plugin framework for PuppeteerSharp

Stars: ✭ 39 (-99.24%)

Mutual labels: chrome, puppeteer, headless-chrome

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Stars: ✭ 277 (-94.6%)

Mutual labels: crawler, scraping, crawling

LInkedIn-Reverese-Lookup

🔎Search LinkedIn profile by email address📧

Stars: ✭ 20 (-99.61%)

Mutual labels: scraping, chromium, puppeteer

Serverless Chrome

🌐 Run headless Chrome/Chromium on AWS Lambda

Stars: ✭ 2,625 (-48.82%)

Mutual labels: chrome, chromium, headless-chrome

Puppeteer Extra

💯 Teach puppeteer new tricks through plugins.

Stars: ✭ 3,397 (-33.77%)

Mutual labels: chrome, puppeteer, headless-chrome

Puppetron

Puppeteer (Headless Chrome Node API)-based rendering solution.

Stars: ✭ 429 (-91.64%)

Mutual labels: chrome, puppeteer, chromium

Puppeteer Sharp

Headless Chrome .NET API

Stars: ✭ 2,122 (-58.63%)

Mutual labels: chrome, puppeteer, chromium

Goose Parser

Universal scrapping tool, which allows you to extract data using multiple environments

Stars: ✭ 211 (-95.89%)

Mutual labels: crawler, scraper, scraping

Ruiji.net

crawler framework, distributed crawler extractor

Stars: ✭ 220 (-95.71%)

Mutual labels: crawler, scraper, headless-chrome

puppet-master

Puppeteer as a service hosted on Saasify.

Stars: ✭ 25 (-99.51%)

Mutual labels: crawling, headless-chrome, puppeteer

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (-87.58%)

Mutual labels: crawler, scraper, crawling

Puppeteer Walker

a puppeteer walker 🕷 🕸

Stars: ✭ 78 (-98.48%)

Mutual labels: crawler, chrome, puppeteer

throughout

🎪 End-to-end testing made simple (using Jest and Puppeteer)

Stars: ✭ 16 (-99.69%)

Mutual labels: chromium, headless-chrome, puppeteer

Dataflowkit

Extract structured data from web sites. Web sites scraping.

Stars: ✭ 456 (-91.11%)

Mutual labels: scraper, scraping, crawling

proxycrawl-python

ProxyCrawl Python library for scraping and crawling

Stars: ✭ 51 (-99.01%)

Mutual labels: scraper, scraping, crawling

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Stars: ✭ 52 (-98.99%)

Mutual labels: scraper, scraping, crawling

Chromium for spider

dynamic crawler for web vulnerability scanner

Stars: ✭ 220 (-95.71%)

Mutual labels: crawler, puppeteer, chromium

Pychromeless

Python Lambda Chrome Automation (naming pending)

Stars: ✭ 219 (-95.73%)

Mutual labels: crawler, chrome, chromium

Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Stars: ✭ 42,343 (+725.56%)

Mutual labels: crawler, scraping, crawling

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-98.05%)

Mutual labels: crawler, scraping, crawling

double-agent

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

Stars: ✭ 123 (-97.6%)

Mutual labels: scraping, crawling, puppeteer

whatsapp-tracking

Scraping the status of WhatsApp contacts

Stars: ✭ 49 (-99.04%)

Mutual labels: scraper, scraping, puppeteer

papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-99.71%)

Mutual labels: crawler, scraper, scraping

Spidy

The simple, easy to use command line web crawler.

Stars: ✭ 257 (-94.99%)

Mutual labels: crawler, crawling

Weibo terminator workflow

Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!

Stars: ✭ 259 (-94.95%)

Mutual labels: crawler, scraper

Playwright Go

Playwright for Go a browser automation library to control Chromium, Firefox and WebKit with a single API.