Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Stars: ✭ 115 (+98.28%)

Mutual labels: big-data

optic

An Erlang/OTP library for reading and updating deeply nested immutable data.

Stars: ✭ 34 (-41.38%)

Mutual labels: lens

Just Dashboard

📊 📋 Dashboards using YAML or JSON files

Stars: ✭ 1,511 (+2505.17%)

Mutual labels: big-data

Qcportal

A client interface to the QCArchive Project (read-only image of QCFractal)

Stars: ✭ 29 (-50%)

Mutual labels: big-data

Ambari

Mirror of Apache Ambari

Stars: ✭ 1,576 (+2617.24%)

Mutual labels: big-data

MLBD

Materials for "Machine Learning on Big Data" course

Stars: ✭ 20 (-65.52%)

Mutual labels: big-data

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

Stars: ✭ 110 (+89.66%)

Mutual labels: big-data

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (+522.41%)

Mutual labels: big-data

Attic Predictionio Sdk Java

PredictionIO Java SDK

Stars: ✭ 107 (+84.48%)

Mutual labels: big-data

Big-Data-Demo

基于Vue、three.js、echarts，数据可视化展示项目，包含三维模型导入交互、三维模型标注等功能

Stars: ✭ 146 (+151.72%)

Mutual labels: big-data

Mysql perf analyzer

MySQL performance monitoring and analysis.

Stars: ✭ 1,423 (+2353.45%)

Mutual labels: big-data

Cython

The most widely used Python to C compiler

Stars: ✭ 6,588 (+11258.62%)

Mutual labels: big-data

Vizuka

Explore high-dimensional datasets and how your algo handles specific regions.

Stars: ✭ 100 (+72.41%)

Mutual labels: big-data

talaria

TalariaDB is a distributed, highly available, and low latency time-series database for Presto

Stars: ✭ 148 (+155.17%)

Mutual labels: big-data

Samza Hello Samza

Mirror of Apache Samza

Stars: ✭ 99 (+70.69%)

Mutual labels: big-data

Sylph

Stream computing platform for bigdata

Stars: ✭ 362 (+524.14%)

Mutual labels: big-data

Kudu

Mirror of Apache Kudu

Stars: ✭ 1,360 (+2244.83%)

Mutual labels: big-data

xcast

A High-Performance Data Science Toolkit for the Earth Sciences

Stars: ✭ 28 (-51.72%)

Mutual labels: big-data

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (+67.24%)

Mutual labels: big-data

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (+1667.24%)

Mutual labels: big-data

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+2206.9%)

Mutual labels: big-data

arrow-datafusion

Apache Arrow DataFusion SQL Query Engine

Stars: ✭ 2,360 (+3968.97%)

Mutual labels: big-data

Reef

Mirror of Apache REEF

Stars: ✭ 92 (+58.62%)

Mutual labels: big-data

Vespa

The open big data serving engine. https://vespa.ai

Stars: ✭ 3,747 (+6360.34%)

Mutual labels: big-data

Bitcoin Value Predictor

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Stars: ✭ 91 (+56.9%)

Mutual labels: big-data

putting-lenses-to-work

A presentation for BayHac 2017 on how I uses lenses at work

Stars: ✭ 73 (+25.86%)

Mutual labels: lens

Parquet Mr

Apache Parquet

Stars: ✭ 1,278 (+2103.45%)

Mutual labels: big-data

Samza

Mirror of Apache Samza

Stars: ✭ 676 (+1065.52%)

Mutual labels: big-data

insightedge

InsightEdge Core

Stars: ✭ 22 (-62.07%)

Mutual labels: big-data

Ymcache

YMCache is a lightweight object caching solution for iOS and Mac OS X that is designed for highly parallel access scenarios.

Stars: ✭ 58 (+0%)

Mutual labels: big-data

Kibble 1

Apache Kibble - a tool to collect, aggregate and visualize data about any software project

Stars: ✭ 54 (-6.9%)

Mutual labels: big-data

Datumbox Framework

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Stars: ✭ 1,063 (+1732.76%)

Mutual labels: big-data

Esper Tv

Esper instance for TV news analysis

Stars: ✭ 37 (-36.21%)

Mutual labels: big-data

Partial.lenses

Partial lenses is a comprehensive, high-performance optics library for JavaScript

Stars: ✭ 846 (+1358.62%)

Mutual labels: lens

Fit Sne

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)