A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+279.19%)

Mutual labels: crawler, spider

Redlock Php

Redis distributed locks in PHP

Stars: ✭ 651 (+276.3%)

Mutual labels: redis, distributed

Arq

Fast job queuing and RPC in python with asyncio and redis.

Stars: ✭ 695 (+301.73%)

Mutual labels: redis, distributed

Fictiondown

Stars: ✭ 362 (+109.25%)

Mutual labels: crawler, spider

Gospider

Gospider - Fast web spider written in Go

Stars: ✭ 785 (+353.76%)

Mutual labels: crawler, spider

Funpyspidersearchengine

Word2vec 千人千面个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

Stars: ✭ 782 (+352.02%)

Mutual labels: spider, redis

Scrapy Cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

Stars: ✭ 921 (+432.37%)

Mutual labels: redis, distributed

Crawler

A high performance web crawler in Elixir.

Stars: ✭ 781 (+351.45%)

Mutual labels: crawler, spider

Disec

Distributed Image Search Engine Crawler

Stars: ✭ 11 (-93.64%)

Mutual labels: crawler, distributed

Appcrawler

Android应用市场网络爬虫

Stars: ✭ 25 (-85.55%)

Mutual labels: crawler, redis

Nodespider

[DEPRECATED] Simple, flexible, delightful web crawler/spider package

Stars: ✭ 33 (-80.92%)

Mutual labels: crawler, spider

Creeper

🐾 Creeper - The Next Generation Crawler Framework (Go)

Stars: ✭ 762 (+340.46%)

Mutual labels: crawler, spider

Js Reverse

JS逆向研究

Stars: ✭ 159 (-8.09%)

Mutual labels: crawler, spider

Crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Stars: ✭ 8,392 (+4750.87%)

Mutual labels: crawler, spider

Foundatio

Pluggable foundation blocks for building distributed apps.

Stars: ✭ 1,365 (+689.02%)

Mutual labels: redis, distributed

Baiduspider

BaiduSpider，一个爬取百度搜索结果的爬虫，目前支持百度网页搜索，百度图片搜索，百度知道搜索，百度视频搜索，百度资讯搜索，百度文库搜索，百度经验搜索和百度百科搜索。

Stars: ✭ 105 (-39.31%)

Mutual labels: crawler, spider

Bilibili member crawler

B站用户爬虫好耶~是爬虫

Stars: ✭ 115 (-33.53%)

Mutual labels: crawler, spider

Scrapingoutsourcing

ScrapingOutsourcing专注分享爬虫代码尽量每周更新一个

Stars: ✭ 164 (-5.2%)

Mutual labels: crawler, spider

Yispider

一款分布式爬虫平台，帮助你更好的管理和开发爬虫。内置一套爬虫定义规则（模版），可使用模版快速定义爬虫，也可当作框架手动开发爬虫。(兴趣使然的项目，用的不爽了就更新)

Stars: ✭ 158 (-8.67%)

Mutual labels: crawler, spider

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Stars: ✭ 57 (-67.05%)

Mutual labels: crawler, spider

Beanbun

Beanbun 是用 PHP 编写的多进程网络爬虫框架，具有良好的开放性、高可扩展性，基于 Workerman。

Stars: ✭ 1,096 (+533.53%)

Mutual labels: crawler, spider

Freshonions Torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

Stars: ✭ 348 (+101.16%)

Mutual labels: crawler, spider

Crawler examples

Some classic web crawler projects.一些经典的爬虫

Stars: ✭ 74 (-57.23%)

Mutual labels: crawler, spider

Memento

Fairly basic redis-like hashmap implementation on top of a epoll TCP server.

Stars: ✭ 74 (-57.23%)

Mutual labels: redis, distributed

Is Google

Verify that a request is from Google crawlers using Google's DNS verification steps

Stars: ✭ 82 (-52.6%)

Mutual labels: crawler, ip

Spider

python crawler spider

Stars: ✭ 70 (-59.54%)

Mutual labels: crawler, spider

Zhihuspider

知乎用户公开个人信息爬虫, 能够爬取用户关注关系，基于Python、使用代理、多线程

Stars: ✭ 92 (-46.82%)

Mutual labels: spider, redis

Proxy Pool

爬虫代理IP池服务，可供其他爬虫程序通过restapi获取

Stars: ✭ 91 (-47.4%)

Mutual labels: crawler, proxypool

Gopa Abandoned

GOPA, a spider written in Go.（NOTE: this project moved to https://github.com/infinitbyte/gopa ）

Stars: ✭ 98 (-43.35%)

Mutual labels: crawler, spider

Arachnid

Powerful web scraping framework for Crystal

Stars: ✭ 68 (-60.69%)

Mutual labels: crawler, spider

Jcrandomproxy

随机代理

Stars: ✭ 105 (-39.31%)

Mutual labels: proxy, proxypool

Bojack

🐴 The unreliable key-value store

Stars: ✭ 101 (-41.62%)

Mutual labels: redis, distributed

Not Your Average Web Crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

Stars: ✭ 107 (-38.15%)

Mutual labels: crawler, spider

Ruia

Async Python 3.6+ web scraping micro-framework based on asyncio

Stars: ✭ 1,366 (+689.6%)

Mutual labels: crawler, spider

Douban Movie

Golang爬虫爬取豆瓣电影Top250

Stars: ✭ 114 (-34.1%)

Mutual labels: crawler, spider

Pkulaw spider

爬取北大法宝网http://www.pkulaw.cn/Case/

Stars: ✭ 113 (-34.68%)

Mutual labels: crawler, spider

Fun crawler

Crawl some picture for fun

Stars: ✭ 169 (-2.31%)

Mutual labels: crawler, spider

Proxypool

Golang实现的IP代理池

Stars: ✭ 1,134 (+555.49%)

Mutual labels: ip, proxypool

Crawlab Lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Stars: ✭ 122 (-29.48%)

Mutual labels: crawler, spider

Scriptspider

一个java版本的分布式的通用爬虫，可以插拔各个组件（提供默认的）

Stars: ✭ 155 (-10.4%)

Mutual labels: spider, redis

Decryptlogin

APIs for loginning some websites by using requests.

Stars: ✭ 1,861 (+975.72%)

Mutual labels: crawler, spider

Pyproxy Async

基于 Python Asyncio + Redis 实现的代理池

Stars: ✭ 123 (-28.9%)

Mutual labels: redis, proxy

Apiproject

[https://www.sofineday.com], golang项目开发脚手架,集成最佳实践(gin+gorm+go-redis+mongo+cors+jwt+json日志库zap(支持日志收集到kafka或mongo)+消息队列kafka+微信支付宝支付gopay+api加密+api反向代理+go modules依赖管理+headless爬虫chromedp+makefile+二进制压缩+livereload热加载)

Stars: ✭ 124 (-28.32%)

Mutual labels: spider, redis

Weibo Topic Spider

微博超级话题爬虫，微博词频统计+情感分析+简单分类，新增肺炎超话爬取数据

Stars: ✭ 128 (-26.01%)

Mutual labels: crawler, spider

Proxy pool

ip proxy pool

Stars: ✭ 126 (-27.17%)

Mutual labels: proxy, proxypool

Digger

Digger is a powerful and flexible web crawler implemented by pure golang