All Projects → zhuyingda → Webster

zhuyingda / Webster

Licence: gpl-3.0
a reliable high-level web crawling & scraping framework for Node.js.

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Webster

Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+1309.07%)
Mutual labels:  crawler, crawling, puppeteer, chromium, headless-chrome
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (-53.02%)
Mutual labels:  crawler, spider, crawling, puppeteer
Chromium for spider
dynamic crawler for web vulnerability scanner
Stars: ✭ 220 (-39.56%)
Mutual labels:  crawler, spider, puppeteer, chromium
Squidwarc
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (-65.66%)
Mutual labels:  crawler, crawling, puppeteer, headless-chrome
Crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+20.88%)
Mutual labels:  crawler, spider, crawling
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-23.9%)
Mutual labels:  crawler, spider, crawling
Crawlergo
A powerful dynamic crawler for web vulnerability scanners
Stars: ✭ 1,088 (+198.9%)
Mutual labels:  crawler, chromium, headless-chrome
Skycaiji
蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Stars: ✭ 1,514 (+315.93%)
Mutual labels:  crawler, spider, crawling
Arachnid
Powerful web scraping framework for Crystal
Stars: ✭ 68 (-81.32%)
Mutual labels:  crawler, spider, crawling
Puppeteer Walker
a puppeteer walker 🕷 🕸
Stars: ✭ 78 (-78.57%)
Mutual labels:  crawler, spider, puppeteer
Ppspider
web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
Stars: ✭ 237 (-34.89%)
Mutual labels:  crawler, spider, puppeteer
puppet-master
Puppeteer as a service hosted on Saasify.
Stars: ✭ 25 (-93.13%)
Mutual labels:  crawling, headless-chrome, puppeteer
Apify Js
Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+766.48%)
Mutual labels:  crawling, puppeteer, headless-chrome
Awesome Puppeteer
A curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+374.73%)
Mutual labels:  crawling, puppeteer, headless-chrome
Phantomas
Headless Chromium-based web performance metrics collector and monitoring tool
Stars: ✭ 2,191 (+501.92%)
Mutual labels:  puppeteer, chromium, headless-chrome
flink-crawler
Continuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-86.81%)
Mutual labels:  crawler, spider, crawling
Colly
Elegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+4167.86%)
Mutual labels:  crawler, spider, crawling
throughout
🎪 End-to-end testing made simple (using Jest and Puppeteer)
Stars: ✭ 16 (-95.6%)
Mutual labels:  chromium, headless-chrome, puppeteer
bots-zoo
No description or website provided.
Stars: ✭ 59 (-83.79%)
Mutual labels:  crawler, crawling, puppeteer
Playwright Go
Playwright for Go a browser automation library to control Chromium, Firefox and WebKit with a single API.
Stars: ✭ 272 (-25.27%)
Mutual labels:  chromium, headless-chrome

Webster

Financial Contributors on Open Collective npm version Build Status

Overview

Webster is A Powerful and Extensible Web Crawling Framework for Node.js application. You can use Webster to crawl websites and extract structured data from their pages.

Which is different from other crawling framework is that Webster can scrape the content which rendered by browser client side javascript and ajax request.

Docker quick start

pull the example docker image:

docker pull zhuyingda/webster-demo
docker run -it zhuyingda/webster-demo

here is a simple demo for crawling this sample site, (which was a demo used by Scrapy framework):

node demo_producer.js
env MOD=debug node demo_consumer.js

Requirements

  • Node.js 10.x+, redis
  • Works on Linux, Mac OSX

Or you can deploy on Docker.

Install

npm install webster

Usage on Raspbian Platform

sudo apt install chromium-browser chromium-codecs-ffmpeg
env MOD=debug EXE_PATH=/usr/bin/chromium-browser node demo_consumer.js

Architecture overview

Documentation

You can see more details from here.

Contributors

Code Contributors

This project exists thanks to all the people who contribute. [Contribute].

Financial Contributors

Become a financial contributor and help us sustain our community. [Contribute]

Individuals

Organizations

Support this project with your organization. Your logo will show up here with a link to your website. [Contribute]

License

GPL-V3

Copyright (c) 2017-present, Yingda (Sugar) Zhu

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].