Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+245.26%)

Mutual labels: crawler, spider, scraper

Crawler

A high performance web crawler in Elixir.

Stars: ✭ 781 (+311.05%)

Mutual labels: crawler, spider, scraper

Scrapingoutsourcing

ScrapingOutsourcing专注分享爬虫代码尽量每周更新一个

Stars: ✭ 164 (-13.68%)

Mutual labels: crawler, spider, scrapy

Haipproxy

💖 High available distributed ip proxy pool, powerd by Scrapy and Redis

Stars: ✭ 4,993 (+2527.89%)

Mutual labels: crawler, spider, scrapy

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (+555.79%)

Mutual labels: crawler, spider, scraper

Marmot

💐Marmot | Web Crawler/HTTP protocol Download Package 🐭

Stars: ✭ 186 (-2.11%)

Mutual labels: crawler, spider, scrapy

Scrapoxy

Scrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!

Stars: ✭ 1,322 (+595.79%)

Mutual labels: crawler, scraper, scrapy

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (+235.26%)

Mutual labels: crawler, scraper, scrapy

Icrawler

A multi-thread crawler framework with many builtin image crawlers provided.

Stars: ✭ 629 (+231.05%)

Mutual labels: crawler, spider, scrapy

Mailinglistscraper

A python web scraper for public email lists.

Stars: ✭ 19 (-90%)

Mutual labels: spider, scraper, scrapy

Crawlab Lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Stars: ✭ 122 (-35.79%)

Mutual labels: crawler, spider, scrapy

Avbook

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

Stars: ✭ 8,133 (+4180.53%)

Mutual labels: crawler, spider, scraper

Crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

Stars: ✭ 440 (+131.58%)

Mutual labels: crawler, spider, scraper

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+2422.63%)

Mutual labels: crawler, spider, scraper

Django Dynamic Scraper

Creating Scrapy scrapers via the Django admin interface

Stars: ✭ 1,024 (+438.95%)

Mutual labels: spider, scraper, scrapy

Not Your Average Web Crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

Stars: ✭ 107 (-43.68%)

Mutual labels: crawler, spider, scraper

View All Similar Projects ➔

Goribot

一个分布式友好的轻量的 Golang 爬虫框架。

完整文档 | Document

!! Warning !!

Goribot 已经被迁移到 Gospider|github.com/zhshch2002/gospider。修复了一些调度问题并分离了网络请求部分到另一个仓库。此仓库会继续保留，建议新朋友使用新的 Gospider。

Goribot has been moved to Gospider|github.com/zhshch2002/gospider. Fixed some scheduling issues and separated the network request part to another repo. This repo will continue to be kept, suggest new friends to use the new Gospider.

🚀Feature

优雅的 API
整洁的文档
高速（单核处理 >1K task/sec）
友善的分布式支持
便捷的细节
- 相对链接自动转换
- 字符编码自动解码
- HTML,JSON 自动解析
丰富的扩展支持
- 请求去重（👈支持分布式）
- 限制请求、速率、并发
- Json，CSV 存储结果
- Robots.txt 支持
- 记录请求异常
- 随机 UA 、随机代理
- 失败重试
轻量，适于学习或快速开箱搭建

版本警告

Goribot 仅支持 Go1.13 及以上版本。

👜获取 Goribot

go get -u github.com/zhshch2002/goribot

Goribot 包含一个历史开发版本，如果您需要使用过那个版本，请拉取 Tag 为 v0.0.1 版本。

⚡建立你的第一个项目

package main

import (
	"fmt"
	"github.com/zhshch2002/goribot"
)

func main() {
	s := goribot.NewSpider()

	s.AddTask(
		goribot.GetReq("https://httpbin.org/get"),
		func(ctx *goribot.Context) {
			fmt.Println(ctx.Resp.Text)
			fmt.Println(ctx.Resp.Json("headers.User-Agent"))
		},
	)

	s.Run()
}

🎉完成

至此你已经可以使用 Goribot 了。更多内容请从开始使用了解。

🙏感谢

万分感谢以上项目的帮助🙏。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 190

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗