DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-93.39%)

Mutual labels: crawler, crawling

Haipproxy

💖 High available distributed ip proxy pool, powerd by Scrapy and Redis

Stars: ✭ 4,993 (+229.79%)

Mutual labels: crawler, spider

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+216.58%)

Mutual labels: crawler, spider

Puppeteer Walker

a puppeteer walker 🕷 🕸

Stars: ✭ 78 (-94.85%)

Mutual labels: crawler, spider

Ferret

Declarative web scraping

Stars: ✭ 4,837 (+219.48%)

Mutual labels: crawler, crawling

Netdiscovery

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

Stars: ✭ 573 (-62.15%)

Mutual labels: crawler, spider

Xxl Crawler

A distributed web crawler framework.（分布式爬虫框架XXL-CRAWLER）

Stars: ✭ 561 (-62.95%)

Mutual labels: crawler, spider

Douyinsdk

抖音 SDK，数据采集，爬虫抓取不是梦

Stars: ✭ 99 (-93.46%)

Mutual labels: crawler, spider

Weixin Spider

微信公众号爬虫，公众号历史文章，文章评论，文章阅读及在看数据，可视化web页面，可部署于Windows服务器。基于Python3之flask/mysql/redis/mitmproxy/pywin32等实现，高效微信爬虫，微信公众号爬虫，历史文章，文章评论，数据更新。

Stars: ✭ 287 (-81.04%)

Mutual labels: crawler, spider

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (-56.67%)

Mutual labels: crawler, spider

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (-57.93%)

Mutual labels: crawler, crawling

Crawler

A high performance web crawler in Elixir.

Stars: ✭ 781 (-48.41%)

Mutual labels: crawler, spider

Icrawler

A multi-thread crawler framework with many builtin image crawlers provided.

Stars: ✭ 629 (-58.45%)

Mutual labels: crawler, spider

Torbot

Dark Web OSINT Tool

Stars: ✭ 821 (-45.77%)

Mutual labels: crawler, spider

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

Stars: ✭ 789 (-47.89%)

Mutual labels: crawler, crawling

Scrapit

Scraping scripts for various websites.

Stars: ✭ 25 (-98.35%)

Mutual labels: crawler, spider

Baiduimagespider

一个超级轻量的百度图片爬虫

Stars: ✭ 591 (-60.96%)

Mutual labels: crawler, spider

Lizard

💐 Full Amazon Automatic Download

Stars: ✭ 41 (-97.29%)

Mutual labels: crawler, spider

Photon

Incredibly fast crawler designed for OSINT.

Stars: ✭ 8,332 (+450.33%)

Mutual labels: crawler, spider

Maman

Rust Web Crawler saving pages on Redis

Stars: ✭ 39 (-97.42%)

Mutual labels: crawler, spider

Avbook

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

Stars: ✭ 8,133 (+437.19%)

Mutual labels: crawler, spider

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Stars: ✭ 57 (-96.24%)

Mutual labels: crawler, spider

Beanbun

Beanbun 是用 PHP 编写的多进程网络爬虫框架，具有良好的开放性、高可扩展性，基于 Workerman。

Stars: ✭ 1,096 (-27.61%)

Mutual labels: crawler, spider

Gospider

golang实现的爬虫框架，使用者只需关心页面规则，提供web管理界面。基于colly开发。