All Projects → trandoshan-io → Crawler

trandoshan-io / Crawler

Licence: gpl-3.0
Go process used to crawl websites

Programming Languages

go
31211 projects - #10 most used programming language
golang
3204 projects

Projects that are alternatives of or similar to Crawler

Newspaper
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+7753.74%)
Mutual labels:  crawler, crawling
Instagram Bot
An Instagram bot developed using the Selenium Framework
Stars: ✭ 138 (-6.12%)
Mutual labels:  crawler, crawling
Ferret
Declarative web scraping
Stars: ✭ 4,837 (+3190.48%)
Mutual labels:  crawler, crawling
Sasila
一个灵活、友好的爬虫框架
Stars: ✭ 286 (+94.56%)
Mutual labels:  crawler, crawling
Dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-31.97%)
Mutual labels:  crawler, crawling
Webster
a reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (+147.62%)
Mutual labels:  crawler, crawling
Easy Scraping Tutorial
Simple but useful Python web scraping tutorial code.
Stars: ✭ 583 (+296.6%)
Mutual labels:  crawler, crawling
flink-crawler
Continuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-67.35%)
Mutual labels:  crawler, crawling
Arachnid
Powerful web scraping framework for Crystal
Stars: ✭ 68 (-53.74%)
Mutual labels:  crawler, crawling
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+436.73%)
Mutual labels:  crawler, crawling
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+88.44%)
Mutual labels:  crawler, crawling
Scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+28704.76%)
Mutual labels:  crawler, crawling
Spidy
The simple, easy to use command line web crawler.
Stars: ✭ 257 (+74.83%)
Mutual labels:  crawler, crawling
Crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+199.32%)
Mutual labels:  crawler, crawling
bots-zoo
No description or website provided.
Stars: ✭ 59 (-59.86%)
Mutual labels:  crawler, crawling
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+3389.12%)
Mutual labels:  crawler, crawling
Colly
Elegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+10468.03%)
Mutual labels:  crawler, crawling
img-cli
An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
Stars: ✭ 15 (-89.8%)
Mutual labels:  crawler, crawling
Scrapyrt
HTTP API for Scrapy spiders
Stars: ✭ 637 (+333.33%)
Mutual labels:  crawler, crawling
Skycaiji
蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Stars: ✭ 1,514 (+929.93%)
Mutual labels:  crawler, crawling

crawler

Build Status Go Report Card Maintainability

Crawler is a Go written program designed to crawl website

features

  • use tor SOCKS proxy to crawl hidden services
  • fast, built using valyala/fasthttp (up to 10x faster than net/http)
  • extract both absolute and relative URLs
  • use scalable messaging protocol (nats)

how it work

  • The Crawler process connect to a nats server (specified by env variable NATS_URI) and set-up a subscriber for message with tag todoSubject
  • When an URL is received the crawler start crawling
  • When crawling is done, the crawler will publish content to nats server with subject contentSubject and found urls with subject doneSubject
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].