splinkImplementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (-18.1%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-48.42%)
visualize-data-with-pythonA Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.
Stars: ✭ 60 (-72.85%)
Crypto DhtBlockchain over DHT in GO
Stars: ✭ 38 (-82.81%)
http benchgolang HTTP stress test tool, support single and distributed
Stars: ✭ 142 (-35.75%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-30.77%)
irisDistributed streaming key-value storage
Stars: ✭ 55 (-75.11%)
litchi这是一款分布式的java游戏服务器框架
Stars: ✭ 97 (-56.11%)
dtmA distributed transaction framework that supports multiple languages, supports saga, tcc, xa, 2-phase message, outbox patterns.
Stars: ✭ 6,110 (+2664.71%)
DkerasDistributed Keras Engine, Make Keras faster with only one line of code.
Stars: ✭ 181 (-18.1%)
ClearlyClearly see and debug your celery cluster in real time!
Stars: ✭ 287 (+29.86%)
ddrtAn elixir implementation of Rtree, optimized for fast updates.
Stars: ✭ 38 (-82.81%)
Weidentity基于区块链的符合W3C DID和Verifiable Credential规范的分布式身份解决方案
Stars: ✭ 972 (+339.82%)
novaWeb framework for Erlang.
Stars: ✭ 175 (-20.81%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (-49.32%)
tensorpeersp2p peer-to-peer training of tensorflow models
Stars: ✭ 57 (-74.21%)
Xxl Job Dotnetxxl-job is a lightweight distributed task scheduling framework, and this package provide a dotnet executor client for it
Stars: ✭ 31 (-85.97%)
moneyDapper Style Distributed Tracing Instrumentation Libraries
Stars: ✭ 65 (-70.59%)
Gym FxForex trading simulator environment for OpenAI Gym, observations contain the order status, performance and timeseries loaded from a CSV file containing rates and indicators. Work In Progress
Stars: ✭ 151 (-31.67%)
agenthashtopolis.org
Stars: ✭ 19 (-91.4%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+331.67%)
PeARS-orchardThis is the decentralised version of PeARS, the people's search engine, to be taken as Phase 1 of the fully distributed system.
Stars: ✭ 34 (-84.62%)
ArchivesparkAn Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Stars: ✭ 111 (-49.77%)
GringoftsGringofts makes it easy to build a replicated, fault-tolerant, high throughput and distributed event-sourced system.
Stars: ✭ 84 (-61.99%)
PucketBucketing and partitioning system for Parquet
Stars: ✭ 29 (-86.88%)
pyrsiaDecentralized Package Network
Stars: ✭ 103 (-53.39%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-2.26%)
oceanbaseOceanBase is an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.
Stars: ✭ 4,466 (+1920.81%)
Lethean VpnLethean Virtual Private Network (VPN)
Stars: ✭ 29 (-86.88%)
Cherry-NodeCherry Network's node implemented in Rust
Stars: ✭ 72 (-67.42%)
MemoThe memo elastic and resilient key-value store.
Stars: ✭ 111 (-49.77%)
semagrowA SPARQL query federator of heterogeneous data sources
Stars: ✭ 27 (-87.78%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+14206.79%)
flowgraphFlowgraph package for scalable asynchronous system development
Stars: ✭ 51 (-76.92%)
Spark TsneDistributed t-SNE via Apache Spark
Stars: ✭ 151 (-31.67%)
dnr-editorDistributed Data-Flow Coordination Platform Based on Node-RED
Stars: ✭ 72 (-67.42%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+739.82%)
monitor-merlinModule for Effortless Redundancy and Loadbalancing In Naemon
Stars: ✭ 21 (-90.5%)
FlintA Time Series Library for Apache Spark
Stars: ✭ 878 (+297.29%)
NScrapyNScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider
Stars: ✭ 88 (-60.18%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-19%)
docsDocumentation repo of nebula orchestration system
Stars: ✭ 16 (-92.76%)
TedsdsApache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Stars: ✭ 14 (-93.67%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-50.23%)
UrhoxUrho3D extension library
Stars: ✭ 13 (-94.12%)
Spark ExcelA Spark plugin for reading Excel files via Apache POI
Stars: ✭ 216 (-2.26%)
Gateway🚀构建分布式即时聊天、消息推送系统。 Building distributed instant messaging, push notification systems.
Stars: ✭ 188 (-14.93%)
9voltA modern, distributed monitoring system written in Go
Stars: ✭ 160 (-27.6%)
OpenubaA robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Stars: ✭ 127 (-42.53%)
MeissaCross-platform Distributed Test Runner. Executes tests in parallel, time balanced on multiple machines.
Stars: ✭ 66 (-70.14%)
PhoenixPeace of mind from prototype to production
Stars: ✭ 17,476 (+7807.69%)