TezApache Tez
Stars: ✭ 313 (-15.41%)
duckdbDuckDB is an in-process SQL OLAP Database Management System
Stars: ✭ 4,707 (+1172.16%)
ArcusARCUS is the NAVER memcached with lists, sets, maps and b+trees. http://naver.github.io/arcus
Stars: ✭ 273 (-26.22%)
pyparEfficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.
Stars: ✭ 66 (-82.16%)
pytorch kmeansImplementation of the k-means algorithm in PyTorch that works for large datasets
Stars: ✭ 38 (-89.73%)
DatahubThe Metadata Platform for the Modern Data Stack
Stars: ✭ 4,232 (+1043.78%)
pyspark-cheatsheetPySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (-68.92%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+954.86%)
hyper-enginePython library for Bayesian hyper-parameters optimization
Stars: ✭ 80 (-78.38%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-30.54%)
agentStore sensitive data such as API tokens
Stars: ✭ 19 (-94.86%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (-2.16%)
v6.dooring.public可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (-12.7%)
python-lsm-dbPython bindings for the SQLite4 LSM database.
Stars: ✭ 115 (-68.92%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+1138.11%)
FluidFluid, elastic data abstraction and acceleration for BigData/AI applications in cloud
Stars: ✭ 265 (-28.38%)
skytableSkytable is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and TLS
Stars: ✭ 696 (+88.11%)
DBMSologyThe Paper List on Design and Implmentation of System Software
Stars: ✭ 67 (-81.89%)
wranglerWrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (-82.97%)
predictionioPredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,510 (+3281.08%)
MorpheusMorpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Stars: ✭ 303 (-18.11%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (-92.16%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-96.22%)
classifai🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (-73.24%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-2.16%)
storm-mlan online learning algorithm library for Storm
Stars: ✭ 18 (-95.14%)
pipelineOONI data processing pipeline
Stars: ✭ 36 (-90.27%)
MLBDMaterials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-94.59%)
Big-Data-Demo基于Vue、three.js、echarts,数据可视化展示项目,包含三维模型导入交互、三维模型标注等功能
Stars: ✭ 146 (-60.54%)
Grouparoo🦘 The Grouparoo Monorepo - open source customer data sync framework
Stars: ✭ 334 (-9.73%)
dockageembedded document/json store
Stars: ✭ 20 (-94.59%)
curiumBluzelle Decentralized Database Service
Stars: ✭ 61 (-83.51%)
meetups-archivosPpts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (-83.78%)
SmooksAn extensible Java framework for building XML and non-XML streaming applications
Stars: ✭ 293 (-20.81%)
ibmpairsopen source tools for interaction with IBM PAIRS:
Stars: ✭ 23 (-93.78%)
arrow-datafusionApache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+537.84%)
VespaThe open big data serving engine. https://vespa.ai
Stars: ✭ 3,747 (+912.7%)
SGDLibraryMATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20
Stars: ✭ 165 (-55.41%)
lensMirror of Apache Lens
Stars: ✭ 57 (-84.59%)
cloudberryBig Data Visualization
Stars: ✭ 89 (-75.95%)
FlinkApache Flink is an open source project of The Apache Software Foundation (ASF).
The Apache Flink project originated from the Stratosphere research project.
Stars: ✭ 17,781 (+4705.68%)
cachegrandcachegrand is an open-source fast, scalable and secure Key-Value store, also fully compatible with Redis protocol, designed from the ground up to take advantage of modern hardware vertical scalability, able to provide better performance and a larger cache at lower cost, without losing focus on distributed systems.
Stars: ✭ 87 (-76.49%)
gino-kevaA simple Git Notes Key Value store
Stars: ✭ 23 (-93.78%)
nebulaA distributed block-based data storage and compute engine
Stars: ✭ 127 (-65.68%)
Unqlite PythonPython bindings for the UnQLite embedded NoSQL database
Stars: ✭ 321 (-13.24%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-2.43%)
PebblesdbThe PebblesDB write-optimized key-value store (SOSP 17)
Stars: ✭ 362 (-2.16%)
Devops RoadmapDevOps methodology & roadmap for a devops developer in 2019. Interesting books to learn new technologies.
Stars: ✭ 349 (-5.68%)
GokvSimple key-value store abstraction and implementations for Go (Redis, Consul, etcd, bbolt, BadgerDB, LevelDB, Memcached, DynamoDB, S3, PostgreSQL, MongoDB, CockroachDB and many more)
Stars: ✭ 314 (-15.14%)