All Projects → gocolly → Colly

gocolly / Colly

Licence: apache-2.0
Elegant Scraper and Crawler Framework for Golang

Programming Languages

go
31211 projects - #10 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to Colly

Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (-98.9%)
Mutual labels:  crawler, spider, scraper, scraping, crawling
Crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (-97.17%)
Mutual labels:  crawler, spider, scraper, scraping, crawling
bots-zoo
No description or website provided.
Stars: ✭ 59 (-99.62%)
Mutual labels:  crawler, scraper, scraping, crawling
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (-99.67%)
Mutual labels:  scraper, spider, scraping, crawling
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-98.22%)
Mutual labels:  crawler, spider, scraping, crawling
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (-94.92%)
Mutual labels:  crawler, scraper, scraping, crawling
Sasila
一个灵活、友好的爬虫框架
Stars: ✭ 286 (-98.16%)
Mutual labels:  crawler, scraping, crawling, framework
Geziyor
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (-91.98%)
Mutual labels:  crawler, spider, scraper, scraping
Scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+172.57%)
Mutual labels:  crawler, scraping, crawling, framework
Antch
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (-98.73%)
Mutual labels:  crawler, scraping, crawling, framework
Ferret
Declarative web scraping
Stars: ✭ 4,837 (-68.86%)
Mutual labels:  crawler, scraper, scraping, crawling
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (-66.98%)
Mutual labels:  crawler, scraper, scraping, crawling
Scrapit
Scraping scripts for various websites.
Stars: ✭ 25 (-99.84%)
Mutual labels:  crawler, spider, scraper
Avbook
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Stars: ✭ 8,133 (-47.65%)
Mutual labels:  crawler, spider, scraper
Awesome Python Primer
自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Stars: ✭ 57 (-99.63%)
Mutual labels:  crawler, spider, scraping
Crawler
A high performance web crawler in Elixir.
Stars: ✭ 781 (-94.97%)
Mutual labels:  crawler, spider, scraper
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Stars: ✭ 1,024 (-93.41%)
Mutual labels:  spider, scraper, scraping
Goribot
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Stars: ✭ 190 (-98.78%)
Mutual labels:  crawler, spider, scraper
Dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-99.36%)
Mutual labels:  crawler, scraping, crawling
Creeper
🐾 Creeper - The Next Generation Crawler Framework (Go)
Stars: ✭ 762 (-95.09%)
Mutual labels:  crawler, spider, framework

Colly

Lightning Fast and Elegant Scraping Framework for Gophers

Colly provides a clean interface to write any kind of crawler/scraper/spider.

With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

GoDoc Backers on Open Collective Sponsors on Open Collective build status report card view examples Code Coverage FOSSA Status Twitter URL

Features

  • Clean API
  • Fast (>1k request/sec on a single core)
  • Manages request delays and maximum concurrency per domain
  • Automatic cookie and session handling
  • Sync/async/parallel scraping
  • Caching
  • Automatic encoding of non-unicode responses
  • Robots.txt support
  • Distributed scraping
  • Configuration via environment variables
  • Extensions

Example

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		e.Request.Visit(e.Attr("href"))
	})

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.Visit("http://go-colly.org/")
}

See examples folder for more detailed examples.

Installation

Add colly to your go.mod file:

module github.com/x/y

go 1.14

require (
        github.com/gocolly/colly/v2 latest
)

Bugs

Bugs or suggestions? Visit the issue tracker or join #colly on freenode

Other Projects Using Colly

Below is a list of public, open source projects that use Colly:

If you are using Colly in a project please send a pull request to add it to the list.

Contributors

This project exists thanks to all the people who contribute. [Contribute].

Backers

Thank you to all our backers! 🙏 [Become a backer]

Sponsors

Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]

License

FOSSA Status

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].