DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-49.49%)

Mutual labels: crawler, scraping, crawling

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

Stars: ✭ 789 (+298.48%)

Mutual labels: crawler, scraping, crawling

Webmagic

A scalable web crawler framework for Java.

Stars: ✭ 10,186 (+5044.44%)

Mutual labels: crawler, scraping, framework

Linkedin Profile Scraper

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.

Stars: ✭ 171 (-13.64%)

Mutual labels: crawler, scraping, crawling

CrawlBox

Easy way to brute-force web directory.

Stars: ✭ 118 (-40.4%)

Mutual labels: crawler, web-crawler

papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-92.42%)

Mutual labels: crawler, scraping

ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (-65.66%)

Mutual labels: scraping, crawling

Scrapy Crawlera

Crawlera middleware for Scrapy

Stars: ✭ 281 (+41.92%)

Mutual labels: crawler, scraping

Apify Js

Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Stars: ✭ 3,154 (+1492.93%)

Mutual labels: scraping, crawling

Supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

Stars: ✭ 306 (+54.55%)

Mutual labels: crawler, web-crawler

Autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Stars: ✭ 4,077 (+1959.09%)

Mutual labels: crawler, scraping

Scrapple

A framework for creating semi-automatic web content extractors

Stars: ✭ 464 (+134.34%)

Mutual labels: crawler, scraping

Spider Flow

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

Stars: ✭ 365 (+84.34%)

Mutual labels: crawler, web-crawler

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+2320.71%)

Mutual labels: crawler, web-crawler

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+231.31%)

Mutual labels: crawler, web-crawler

feedsearch-crawler

Crawl sites for RSS, Atom, and JSON feeds.

Stars: ✭ 23 (-88.38%)

Mutual labels: scraping, crawling

Webster

a reliable high-level web crawling & scraping framework for Node.js.

Stars: ✭ 364 (+83.84%)

Mutual labels: crawler, crawling

Creeper

🐾 Creeper - The Next Generation Crawler Framework (Go)

Stars: ✭ 762 (+284.85%)

Mutual labels: crawler, framework

Maman

Rust Web Crawler saving pages on Redis

Stars: ✭ 39 (-80.3%)

Mutual labels: crawler, web-crawler

img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

Stars: ✭ 15 (-92.42%)

Mutual labels: crawler, crawling

Nutch

Apache Nutch is an extensible and scalable web crawler

Stars: ✭ 2,277 (+1050%)

Mutual labels: crawling, web-crawler

Spidermon

Scrapy Extension for monitoring spiders execution.

Stars: ✭ 309 (+56.06%)

Mutual labels: scraping, crawling

pomp

Screen scraping and web crawling framework

Stars: ✭ 61 (-69.19%)

Mutual labels: scraping, crawling

Dataflowkit

Extract structured data from web sites. Web sites scraping.

Stars: ✭ 456 (+130.3%)

Mutual labels: scraping, crawling

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (+529.29%)

Mutual labels: crawler, scraping

Pastepwn

Python framework to scrape Pastebin pastes and analyze them

Stars: ✭ 87 (-56.06%)

Mutual labels: scraping, framework

D4n155

OWASP D4N155 - Intelligent and dynamic wordlist using OSINT

Stars: ✭ 105 (-46.97%)

Mutual labels: crawler, scraping

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (+221.72%)

Mutual labels: crawler, crawling

Newcrawler

Free Web Scraping Tool with Java

Stars: ✭ 589 (+197.47%)

Mutual labels: crawler, scraping

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Stars: ✭ 57 (-71.21%)

Mutual labels: crawler, scraping

Infinitycrawler

A simple but powerful web crawler library for .NET

Stars: ✭ 97 (-51.01%)

Mutual labels: crawler, web-crawler

Zhihu Crawler People

A simple distributed crawler for zhihu && data analysis

Stars: ✭ 182 (-8.08%)

Mutual labels: crawler, web-crawler

Arachnid

Powerful web scraping framework for Crystal

Stars: ✭ 68 (-65.66%)

Mutual labels: crawler, crawling

Abotx

Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.

Stars: ✭ 63 (-68.18%)

Mutual labels: web-crawler, framework

Web Bee

🐝 Web vertical crawler framework for fun

Stars: ✭ 184 (-7.07%)

Mutual labels: crawler, framework

Terpene Profile Parser For Cannabis Strains

Parser and database to index the terpene profile of different strains of Cannabis from online databases

Stars: ✭ 63 (-68.18%)

Mutual labels: crawler, web-crawler

Grawler

Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.

Stars: ✭ 98 (-50.51%)

Mutual labels: scraping, crawling

Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

Stars: ✭ 125 (-36.87%)

Mutual labels: crawler, crawling

Newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Stars: ✭ 11,545 (+5730.81%)

Mutual labels: crawler, crawling

Crawler

Go process used to crawl websites