web spider built by puppeteer, support task-queue and task-scheduling by decorators，support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架，提供灵活的任务队列管理调度方案，提供便捷的数据保存方案（nedb/mongodb），提供数据可视化和用户交互的实现方案

Stars: ✭ 237 (-59.76%)

Mutual labels: crawler, spider

crawler

A simple and flexible web crawler framework for java.

Stars: ✭ 20 (-96.6%)

Mutual labels: crawler, spider

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Stars: ✭ 48 (-91.85%)

Mutual labels: crawler, spider

bots-zoo

No description or website provided.

Stars: ✭ 59 (-89.98%)

Mutual labels: crawler, scraping

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

Stars: ✭ 38 (-93.55%)

Mutual labels: spider, scraping

Gospider

golang实现的爬虫框架，使用者只需关心页面规则，提供web管理界面。基于colly开发。

Stars: ✭ 285 (-51.61%)

Mutual labels: crawler, spider

Sasila

一个灵活、友好的爬虫框架

Stars: ✭ 286 (-51.44%)

Mutual labels: crawler, scraping

Toapi

Every web site provides APIs.

Stars: ✭ 3,209 (+444.82%)

Mutual labels: crawler, spider

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+713.75%)

Mutual labels: crawler, spider

Autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Stars: ✭ 4,077 (+592.19%)

Mutual labels: crawler, scraping

Ttbot

今日头条机器人，支持用户登陆、关注、取消关注、获取关注粉丝、发文、发悟空问答、点赞、评论、采集各种类型新闻讯息等，使用今日头条网页版API实现

Stars: ✭ 338 (-42.61%)

Mutual labels: crawler, spider

Gosint

OSINT Swiss Army Knife

Stars: ✭ 401 (-31.92%)

Mutual labels: crawler, spider

Scrapple

A framework for creating semi-automatic web content extractors

Stars: ✭ 464 (-21.22%)

Mutual labels: crawler, scraping

Chromium for spider

dynamic crawler for web vulnerability scanner

Stars: ✭ 220 (-62.65%)

Mutual labels: crawler, spider

Xxl Crawler

A distributed web crawler framework.（分布式爬虫框架XXL-CRAWLER）

Stars: ✭ 561 (-4.75%)

Mutual labels: crawler, spider

Jd mask robot

京东口罩库存监控爬虫(非selenium)，扫码登录、查价、加购、下单、秒杀

Stars: ✭ 216 (-63.33%)

Mutual labels: crawler, spider

Magic google

Google search results crawler, get google search results that you need

Stars: ✭ 247 (-58.06%)

Mutual labels: crawler, spider

Fast Lianjia Crawler

直接通过链家 API 抓取数据的极速爬虫，宇宙最快~~ 🚀

Stars: ✭ 247 (-58.06%)

Mutual labels: crawler, spider

Haipproxy

💖 High available distributed ip proxy pool, powerd by Scrapy and Redis

Stars: ✭ 4,993 (+747.71%)

Mutual labels: crawler, spider

Webvideobot

Web crawler.

Stars: ✭ 214 (-63.67%)

Mutual labels: crawler, spider

papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-97.45%)

Mutual labels: crawler, scraping

Html2article

Html网页正文提取

Stars: ✭ 441 (-25.13%)

Mutual labels: crawler, spider

scrapy facebooker

Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.

Stars: ✭ 22 (-96.26%)

Mutual labels: spider, scraping

WebCrawler

一个轻量级、快速、多线程、多管道、灵活配置的网络爬虫。