zhk0603 / WebCrawler

Licence: MIT License

一个轻量级、快速、多线程、多管道、灵活配置的网络爬虫。

Programming Languages

18002 projects

Projects that are alternatives of or similar to WebCrawler

arachnod

High performance crawler for Nodejs

Stars: ✭ 17 (-56.41%)

Mutual labels: crawler, spider

Chromium for spider

dynamic crawler for web vulnerability scanner

Stars: ✭ 220 (+464.1%)

Mutual labels: crawler, spider

Colly

Elegant Scraper and Crawler Framework for Golang

Stars: ✭ 15,535 (+39733.33%)

Mutual labels: crawler, spider

Zhihuspider

多线程知乎用户爬虫，基于python3

Stars: ✭ 201 (+415.38%)

Mutual labels: crawler, spider

Magic google

Google search results crawler, get google search results that you need

Stars: ✭ 247 (+533.33%)

Mutual labels: crawler, spider

Querylist

🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

Stars: ✭ 2,392 (+6033.33%)

Mutual labels: crawler, spider

Jd mask robot

京东口罩库存监控爬虫(非selenium)，扫码登录、查价、加购、下单、秒杀

Stars: ✭ 216 (+453.85%)

Mutual labels: crawler, spider

Marmot

💐Marmot | Web Crawler/HTTP protocol Download Package 🐭

Stars: ✭ 186 (+376.92%)

Mutual labels: crawler, spider

Fast Lianjia Crawler

直接通过链家 API 抓取数据的极速爬虫，宇宙最快~~ 🚀

Stars: ✭ 247 (+533.33%)

Mutual labels: crawler, spider

Ppspider

web spider built by puppeteer, support task-queue and task-scheduling by decorators，support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架，提供灵活的任务队列管理调度方案，提供便捷的数据保存方案（nedb/mongodb），提供数据可视化和用户交互的实现方案

Stars: ✭ 237 (+507.69%)

Mutual labels: crawler, spider

Ok ip proxy pool

🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池

Stars: ✭ 196 (+402.56%)

Mutual labels: crawler, spider

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Stars: ✭ 48 (+23.08%)

Mutual labels: crawler, spider

Fooproxy

稳健高效的评分制-针对性- IP代理池 + API服务，可以自己插入采集器进行代理IP的爬取，针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库，支持MongoDB 4.0 使用 Python3.7（Scored IP proxy pool ,customise proxy data crawler can be added anytime）

Stars: ✭ 195 (+400%)

Mutual labels: crawler, spider

Jssoup

JavaScript + BeautifulSoup = JSSoup

Stars: ✭ 203 (+420.51%)

Mutual labels: crawler, spider

Goribot

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Stars: ✭ 190 (+387.18%)

Mutual labels: crawler, spider

Webvideobot

Web crawler.

Stars: ✭ 214 (+448.72%)

Mutual labels: crawler, spider

Zhihu Crawler People

A simple distributed crawler for zhihu && data analysis

Stars: ✭ 182 (+366.67%)

Mutual labels: crawler, spider

Lianjia Beike Spider

链家网和贝壳网房价爬虫，采集北京上海广州深圳等21个中国主要城市的房价数据（小区，二手房，出租房，新房），稳定可靠快速！支持csv,MySQL, MongoDB,Excel, json存储，支持Python2和3，图表展示数据，注释丰富，点星支持，仅供学习参考，请勿用于商业用途，后果自负。

Stars: ✭ 2,257 (+5687.18%)

Mutual labels: crawler, spider

Laravel Crawler Detect

A Laravel wrapper for CrawlerDetect - the web crawler detection library

Stars: ✭ 227 (+482.05%)

Mutual labels: crawler, spider

crawler

A simple and flexible web crawler framework for java.

Stars: ✭ 20 (-48.72%)

Mutual labels: crawler, spider

View All Similar Projects ➔

Web Crawler

这是一个轻量级、快速、多线程、多管道、灵活配置的网络爬虫。

架构设计

WebCrawler 采用的是一个多管道、多调度器的设计与处理模型，任何事情通过管道处理，默认提供了一些常用的管道，开发者可自由扩展管道，组装成一个强大的爬虫。

多管道

在一个爬虫里，通常会多个动作。
比如爬取某网站的文章数据，通常会有一下这几个操作：

从无数 url 中，分析确定需要爬取的文章url；
分析文章页面数据，提取需要的信息；
持久化数据，保存到数据库或者导出到Excel中。

为了更方便维护，代码结构更简单，我们可以为每一个操作编写独立管道（每个管道职责尽可能单一并且耦合性极低），多个管道协同工作，最终完成一个页面的抓取工作。在实际编写爬虫中，开发者只需专注于编写业务逻辑，其余的事情框架内部已经帮你处理好了。
在 WebCrawler 里 Pipeline 有两种运行方式：

管道链模式：

链条模式类似于“搭积木”，将多个管道拼接组装在一起，管道连着管道，形成一个闭合的处理管道链。我们推荐在编写具有连续性任务爬虫的时候，采用此模式。

并行模式：

并行模式，顾名思义，也就是说 N 个管道同时运行，没有了链条关系，它们通过调度器协同工作。

示例

请参阅 Crawler.Simple 项目，从简单到复杂都有很好的示例。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

zhk0603 / WebCrawler

Programming Languages

Labels

Projects that are alternatives of or similar to WebCrawler

Web Crawler

架构设计

多管道

示例