A tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure. Cloudbreak can be used to provision Hadoop across cloud infrastructure providers including AWS, Azure, GCP and OpenStack.

Stars: ✭ 301 (-18.65%)

Mutual labels: big-data

SwiftLvDB

A fast key-value storage library , leveldb for swift

Stars: ✭ 14 (-96.22%)

Mutual labels: key-value-store

predictionio-template-java-ecom-recommender

PredictionIO E-Commerce Recommendation Engine Template (Java-based parallelized engine)

Stars: ✭ 36 (-90.27%)

Mutual labels: big-data

Immortaldb

🔩 A relentless key-value store for the browser.

Stars: ✭ 2,962 (+700.54%)

Mutual labels: key-value-store

Uproot3

ROOT I/O in pure Python and NumPy.

Stars: ✭ 312 (-15.68%)

Mutual labels: big-data

Tkvdb

Trie key-value database

Stars: ✭ 265 (-28.38%)

Mutual labels: key-value-store

Stroom

Stroom is a highly scalable data storage, processing and analysis platform.

Stars: ✭ 344 (-7.03%)

Mutual labels: big-data

bandar-log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 20 (-94.59%)

Mutual labels: big-data

Helix

Mirror of Apache Helix

Stars: ✭ 304 (-17.84%)

Mutual labels: big-data

bigtable

TypeScript Bigtable Client with 🔋🔋 included.

Stars: ✭ 13 (-96.49%)

Mutual labels: big-data

Bigtop

Mirror of Apache Bigtop

Stars: ✭ 356 (-3.78%)

Mutual labels: big-data

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-70%)

Mutual labels: big-data

Baize

白泽自动化运维系统：配置管理、网络探测、资产管理、业务管理、CMDB、CD、DevOps、作业编排、任务编排等功能,未来将添加监控、报警、日志分析、大数据分析等部分内容

Stars: ✭ 296 (-20%)

Mutual labels: big-data

Beeva Best Practices

Best Practices and Style Guides in BEEVA

Stars: ✭ 335 (-9.46%)

Mutual labels: big-data

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-96.49%)

Mutual labels: big-data

Crate

CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of data in real-time.

Stars: ✭ 3,254 (+779.46%)

Mutual labels: big-data

mimir

Generates minimal embedded database from structs in golang; moved to gitlab.com/microo8/mimir

Stars: ✭ 40 (-89.19%)

Mutual labels: embedded-database

spark-acid

ACID Data Source for Apache Spark based on Hive ACID

Stars: ✭ 91 (-75.41%)

Mutual labels: big-data

Parquet Dotnet

🏐 Apache Parquet for modern .NET

Stars: ✭ 276 (-25.41%)

Mutual labels: big-data

Tez

Apache Tez

Stars: ✭ 313 (-15.41%)

Mutual labels: big-data

Arcus

ARCUS is the NAVER memcached with lists, sets, maps and b+trees. http://naver.github.io/arcus

Stars: ✭ 273 (-26.22%)

Mutual labels: key-value-store

Attic Apex Core

Mirror of Apache Apex core

Stars: ✭ 346 (-6.49%)

Mutual labels: big-data

Datahub

The Metadata Platform for the Modern Data Stack

Stars: ✭ 4,232 (+1043.78%)

Mutual labels: big-data

Delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+954.86%)

Mutual labels: big-data

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (-30.54%)

Mutual labels: big-data

Sylph

Stream computing platform for bigdata

Stars: ✭ 362 (-2.16%)

Mutual labels: big-data

python-lsm-db

Python bindings for the SQLite4 LSM database.

Stars: ✭ 115 (-68.92%)

Mutual labels: embedded-database

Fluid

Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud

Stars: ✭ 265 (-28.38%)

Mutual labels: big-data

skytable

Skytable is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and TLS

Stars: ✭ 696 (+88.11%)

Mutual labels: key-value-store

Parquet Cpp

Apache Parquet

Stars: ✭ 339 (-8.38%)

Mutual labels: big-data

foundationdb-dotnet-client

C#/.NET Binding for FoundationDB Client API

Stars: ✭ 118 (-68.11%)

Mutual labels: key-value-store

Morpheus

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.

Stars: ✭ 303 (-18.11%)

Mutual labels: big-data

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-96.22%)

Mutual labels: big-data

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (-2.16%)

Mutual labels: big-data

pipeline

OONI data processing pipeline

Stars: ✭ 36 (-90.27%)

Mutual labels: big-data

Couchdb Fauxton

Apache CouchDB

Stars: ✭ 295 (-20.27%)

Mutual labels: big-data

NiFi-Rule-engine-processor

Drools processor for Apache NiFi

Stars: ✭ 34 (-90.81%)

Mutual labels: big-data

Grouparoo

🦘 The Grouparoo Monorepo - open source customer data sync framework

Stars: ✭ 334 (-9.73%)

Mutual labels: big-data

curium

Bluzelle Decentralized Database Service

Stars: ✭ 61 (-83.51%)

Mutual labels: key-value-store

Smooks

An extensible Java framework for building XML and non-XML streaming applications

Stars: ✭ 293 (-20.81%)

Mutual labels: big-data

ibmpairs

open source tools for interaction with IBM PAIRS:

Stars: ✭ 23 (-93.78%)

Mutual labels: big-data

Vespa

The open big data serving engine. https://vespa.ai

Stars: ✭ 3,747 (+912.7%)

Mutual labels: big-data

lens

Mirror of Apache Lens

Stars: ✭ 57 (-84.59%)

Mutual labels: big-data

Flink

Apache Flink is an open source project of The Apache Software Foundation (ASF). The Apache Flink project originated from the Stratosphere research project.

Stars: ✭ 17,781 (+4705.68%)

Mutual labels: big-data

Unqlite Python

Python bindings for the UnQLite embedded NoSQL database

Stars: ✭ 321 (-13.24%)

Mutual labels: embedded-database

gino-keva

A simple Git Notes Key Value store

Stars: ✭ 23 (-93.78%)

Mutual labels: key-value-store

Oie Resources

A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.