Scrapy ClusterThis Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Stars: ✭ 921 (+1946.67%)
Scrapy RedisRedis-based components for Scrapy.
Stars: ✭ 4,998 (+11006.67%)
NScrapyNScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider
Stars: ✭ 88 (+95.56%)
Haipproxy💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+10995.56%)
GerapyDistributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
Stars: ✭ 2,601 (+5680%)
go-citaA Go implementation of CITA. https://docs.nervos.org/cita
Stars: ✭ 25 (-44.44%)
meeseeTask queue, Long lived workers for work based parallelization, with processes and Redis as back-end. For distributed computing.
Stars: ✭ 14 (-68.89%)
intelli-swift-coreDistributed, Column-oriented storage, Realtime analysis, High performance Database
Stars: ✭ 17 (-62.22%)
pooljsBrowser computing unleashed!
Stars: ✭ 17 (-62.22%)
itemadapterCommon interface for data container classes
Stars: ✭ 47 (+4.44%)
goimpulse高可用,高性能的分布式发号服务
Stars: ✭ 17 (-62.22%)
dist-frameworkA prototype for distributed training/validation/evaluation/extraction with PyTorch.
Stars: ✭ 14 (-68.89%)
GraviTGraviT is a distributed ray tracing framework that enables applications to leverage hardware-optimized ray tracers within a single environment across many nodes for large-scale rendering tasks.
Stars: ✭ 18 (-60%)
sprawlAlpha implementation of the Sprawl distributed marketplace protocol.
Stars: ✭ 27 (-40%)
Web-IotaIota is a web scraper which can find all of the images and links/suburls on a webpage
Stars: ✭ 60 (+33.33%)
blockchain-hackathonAn electronic health record (EHR) system built on Hyperledger Composer blockchain
Stars: ✭ 67 (+48.89%)
p2p-projectA peer-to-peer networking framework to work across languages
Stars: ✭ 68 (+51.11%)
simplxC++ development framework for building reliable cache-friendly distributed and concurrent multicore software
Stars: ✭ 61 (+35.56%)
archeAnalyze scraped data
Stars: ✭ 49 (+8.89%)
tool-dbA peer-to-peer decentralized database
Stars: ✭ 15 (-66.67%)
dask-sqlDistributed SQL Engine in Python using Dask
Stars: ✭ 271 (+502.22%)
scrapy-LBCAraignée LeBonCoin avec Scrapy et ElasticSearch
Stars: ✭ 14 (-68.89%)
majordodoDistributed Operations and Data Organizer built on Apache BookKeeper
Stars: ✭ 25 (-44.44%)
FedScaleFedScale is a scalable and extensible open-source federated learning (FL) platform.
Stars: ✭ 274 (+508.89%)
heatDistributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
Stars: ✭ 127 (+182.22%)
asyncpy使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架
Stars: ✭ 86 (+91.11%)
fernando-pessoaClassificador de poemas do Fernando Pessoa de acordo com os seus heterônimos
Stars: ✭ 31 (-31.11%)
xmutca-rpcXmutca-rpc是一个基于netty开发的分布式服务框架,提供稳定高性能的RPC远程服务调用功能,支持注册中心,服务治理,负载均衡等特性,开箱即用。
Stars: ✭ 18 (-60%)
ArticleSpiderCrawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).
Stars: ✭ 34 (-24.44%)
elfoYour next actor system
Stars: ✭ 38 (-15.56%)
CreditsCredits(CRDS) - An Evolving Currency For An Evolving Society
Stars: ✭ 14 (-68.89%)
scrapy helperDynamic configurable crawl (动态可配置化爬虫)
Stars: ✭ 84 (+86.67%)
toy-rpcJava基于Netty,Protostuff和Zookeeper实现分布式RPC框架
Stars: ✭ 55 (+22.22%)
soundstormThe Federated Social Audio Platform
Stars: ✭ 26 (-42.22%)
GalaxyGalaxy is an asynchronous parallel visualization ray tracer for performant rendering in distributed computing environments. Galaxy builds upon Intel OSPRay and Intel Embree, including ray queueing and sending logic inspired by TACC GraviT.
Stars: ✭ 18 (-60%)
rockgoA developing game server framework,based on Entity Component System(ECS).
Stars: ✭ 617 (+1271.11%)
InventusInventus is a spider designed to find subdomains of a specific domain by crawling it and any subdomains it discovers.
Stars: ✭ 80 (+77.78%)
zlimiterA toolkit for rate limite,support memory and redis
Stars: ✭ 17 (-62.22%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+173.33%)
erl distRust Implementation of Erlang Distribution Protocol
Stars: ✭ 110 (+144.44%)
WeIdentity基于区块链的符合W3C DID和Verifiable Credential规范的分布式身份解决方案
Stars: ✭ 1,063 (+2262.22%)
lgcrawlpython+scrapy+splash 爬取拉勾全站职位信息
Stars: ✭ 22 (-51.11%)
scrapy-html-storageScrapy downloader middleware that stores response HTMLs to disk.
Stars: ✭ 17 (-62.22%)
pagserPagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
Stars: ✭ 82 (+82.22%)
tipsTiKV based Pub/Sub server
Stars: ✭ 31 (-31.11%)
domainsWorld’s single largest Internet domains dataset
Stars: ✭ 461 (+924.44%)
osiloPersonal data silos with secure sharing
Stars: ✭ 15 (-66.67%)
vietnam-ecommerce-crawlerCrawling the data from lazada, websosanh, compare.vn, cdiscount and cungmua with flexible configs
Stars: ✭ 28 (-37.78%)
spicedbOpen Source, Google Zanzibar-inspired fine-grained permissions database
Stars: ✭ 3,358 (+7362.22%)
webhungerWebHunger is an extensible, full-scale crawler framework that supports distributed crawling, aiming at getting users focused on web page parsing without concerning for the crawling process.
Stars: ✭ 17 (-62.22%)
FastNNFastNN provides distributed training examples that use EPL.
Stars: ✭ 79 (+75.56%)