gocolly / Colly

Licence: apache-2.0

Elegant Scraper and Crawler Framework for Golang

Programming Languages

31211 projects - #10 most used programming language

HTML

75241 projects

Projects that are alternatives of or similar to Colly

Linkedin Profile Scraper

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.

Stars: ✭ 171 (-98.9%)

Mutual labels: crawler, spider, scraper, scraping, crawling

Crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

Stars: ✭ 440 (-97.17%)

Mutual labels: crawler, spider, scraper, scraping, crawling

bots-zoo

No description or website provided.

Stars: ✭ 59 (-99.62%)

Mutual labels: crawler, scraper, scraping, crawling

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Stars: ✭ 52 (-99.67%)

Mutual labels: scraper, spider, scraping, crawling

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Stars: ✭ 277 (-98.22%)

Mutual labels: crawler, spider, scraping, crawling

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

Stars: ✭ 789 (-94.92%)

Mutual labels: crawler, scraper, scraping, crawling

Sasila

一个灵活、友好的爬虫框架

Stars: ✭ 286 (-98.16%)

Mutual labels: crawler, scraping, crawling, framework

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (-91.98%)

Mutual labels: crawler, spider, scraper, scraping

Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Stars: ✭ 42,343 (+172.57%)

Mutual labels: crawler, scraping, crawling, framework

Antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Stars: ✭ 198 (-98.73%)

Mutual labels: crawler, scraping, crawling, framework

Ferret

Declarative web scraping

Stars: ✭ 4,837 (-68.86%)

Mutual labels: crawler, scraper, scraping, crawling

Headless Chrome Crawler

Distributed crawler powered by Headless Chrome

Stars: ✭ 5,129 (-66.98%)

Mutual labels: crawler, scraper, scraping, crawling

Scrapit

Scraping scripts for various websites.

Stars: ✭ 25 (-99.84%)

Mutual labels: crawler, spider, scraper

Avbook

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

Stars: ✭ 8,133 (-47.65%)

Mutual labels: crawler, spider, scraper

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Stars: ✭ 57 (-99.63%)

Mutual labels: crawler, spider, scraping

Crawler

A high performance web crawler in Elixir.

Stars: ✭ 781 (-94.97%)

Mutual labels: crawler, spider, scraper

Django Dynamic Scraper

Creating Scrapy scrapers via the Django admin interface

Stars: ✭ 1,024 (-93.41%)

Mutual labels: spider, scraper, scraping

Goribot

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Stars: ✭ 190 (-98.78%)

Mutual labels: crawler, spider, scraper

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-99.36%)

Mutual labels: crawler, scraping, crawling

Creeper

🐾 Creeper - The Next Generation Crawler Framework (Go)

Stars: ✭ 762 (-95.09%)

Mutual labels: crawler, spider, framework

View All Similar Projects ➔

Colly

Lightning Fast and Elegant Scraping Framework for Gophers

Colly provides a clean interface to write any kind of crawler/scraper/spider.

With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

Features

Clean API
Fast (>1k request/sec on a single core)
Manages request delays and maximum concurrency per domain
Automatic cookie and session handling
Sync/async/parallel scraping
Caching
Automatic encoding of non-unicode responses
Robots.txt support
Distributed scraping
Configuration via environment variables
Extensions

Example

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		e.Request.Visit(e.Attr("href"))
	})

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.Visit("http://go-colly.org/")
}

See examples folder for more detailed examples.

Installation

Add colly to your go.mod file:

module github.com/x/y

go 1.14

require (
        github.com/gocolly/colly/v2 latest
)

Bugs

Bugs or suggestions? Visit the issue tracker or join #colly on freenode

Other Projects Using Colly

Below is a list of public, open source projects that use Colly:

greenpeace/check-my-pages Scraping script to test the Spanish Greenpeace web archive.
altsab/gowap Wappalyzer implementation in Go.
jesuiscamille/goquotes A quotes scrapper, making your day a little better!
jivesearch/jivesearch A search engine that doesn't track you.
Leagify/colly-draft-prospects A scraper for future NFL Draft prospects.
lucasepe/go-ps4 Search playstation store for your favorite PS4 games using the command line.
yringler/inside-chassidus-scraper Scrapes Rabbi Paltiel's web site for lesson metadata.
gamedb/gamedb A database of Steam games.
lawzava/scrape CLI for email scraping from any website.
eureka101v/WeiboSpiderGo A sina weibo(chinese twitter) scrapper
Go-phie/gophie Search, Download and Stream movies from your terminal
imthaghost/goclone Clone websites to your computer within seconds.
superiss/spidy Crawl the web and collect expired domains.
docker-slim/docker-slim Optimize your Docker containers to make them smaller and better.
seversky/gachifinder an agent for asynchronous scraping, parsing and writing to some storages(elasticsearch for now)
eval-exec/goodreads crawl all tags and all pages of quotes from goodreads.

If you are using Colly in a project please send a pull request to add it to the list.

Contributors

This project exists thanks to all the people who contribute. [Contribute].

Backers

Thank you to all our backers! 🙏 [Become a backer]

License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

gocolly / Colly

Programming Languages

Labels

Projects that are alternatives of or similar to Colly

Colly

Features

Example

Installation

Bugs

Other Projects Using Colly

Contributors

Backers

Sponsors

License