Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → zcrawl → zcrawl

zcrawl / zcrawl

Licence: other

An open source web crawling platform

Programming Languages

31211 projects - #10 most used programming language

77523 projects

Labels

scraping crawling crawlers web-crawling webcrawling

Projects that are alternatives of or similar to zcrawl

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (+223.81%)

Mutual labels: scraping, crawling, webcrawling

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Stars: ✭ 52 (+147.62%)

Mutual labels: scraping, crawling, crawlers

Web Scraping Framework

Stars: ✭ 31 (+47.62%)

Mutual labels: scraping, web-crawling, webcrawling

Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Stars: ✭ 3,154 (+14919.05%)

Mutual labels: scraping, crawling, web-crawling

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (+376.19%)

Mutual labels: scraping, crawling

Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.

Stars: ✭ 98 (+366.67%)

Mutual labels: scraping, crawling

crawling-framework

Easily crawl news portals or blog sites using Storm Crawler.

Stars: ✭ 22 (+4.76%)

Mutual labels: scraping, crawling

Linkedin Profile Scraper

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.

Stars: ✭ 171 (+714.29%)

Mutual labels: scraping, crawling

Declarative web scraping

Stars: ✭ 4,837 (+22933.33%)

Mutual labels: scraping, crawling

Scrapy, a fast high-level web crawling & scraping framework for Python.

Stars: ✭ 42,343 (+201533.33%)

Mutual labels: scraping, crawling

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Stars: ✭ 198 (+842.86%)

Mutual labels: scraping, crawling

[Unmaintained] A simple and clean video/music/image downloader 👾

Stars: ✭ 789 (+3657.14%)

Mutual labels: scraping, crawling

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (+2676.19%)

Mutual labels: scraping, crawling

Headless Chrome Crawler

Distributed crawler powered by Headless Chrome

Stars: ✭ 5,129 (+24323.81%)

Mutual labels: scraping, crawling

scrape-github-trending

Tutorial for web scraping / crawling with Node.js.

Stars: ✭ 42 (+100%)

Mutual labels: scraping, crawling

Awesome Puppeteer

A curated list of awesome puppeteer resources.

Stars: ✭ 1,728 (+8128.57%)

Mutual labels: scraping, crawling

Distributed crawling framework for documents and structured data.

Stars: ✭ 248 (+1080.95%)

Mutual labels: scraping, crawling

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

Stars: ✭ 123 (+485.71%)

Mutual labels: scraping, crawling

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

Stars: ✭ 53 (+152.38%)

Mutual labels: scraping, crawling

Crawly, a high-level web crawling & scraping framework for Elixir.

Stars: ✭ 440 (+1995.24%)

Mutual labels: scraping, crawling

View All Similar Projects ➔

zcrawl

zcrawl is an open source software platform to deploy and orchestrate web crawlers and crawling tasks in general. It's written in Go and one of the goals is to make it as flexible as possible to allow integrations with different languages and third-party services.

In order to avoid any language lock-ins, zcrawl will provide enough tools to enhance the process of creating and deploying a web crawler using your favorite language, so it's not Go specific.

We're still in the planning phase and the roadmap is subject to changes. A prototype is in progress and it's being developed as we want to test some of our ideas in a minimal way.

How to use it?

No instructions are provided at this time, if you're interested feel free to pull the code, build it and see what happens :)

Is it for me?

The project is targeted to users who want an easy way of deploying web crawlers, without messing up with crontab (in case you need to schedule recurrent crawling jobs), plain CSV files (in case you do this straight from the command line), multi-worker environments (when you need to orchestrate a distributed crawling task) and more complex pipelines that might be a combination of all these tasks.

Think about this as a Heroku-like solution where you can deploy text crawlers and orchestrate them to re-train your machine learning models with fresh data, everything in your own infrastructure. This is the type of scenarios we're interested in.

Roadmap

TBA

Contact

hello AT zcrawl DOT org

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 21

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗