Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Stars: ✭ 115 (+0%)

Mutual labels: big-data

Orc

An ORC file format reader and writer for Go.

Stars: ✭ 97 (-15.65%)

Mutual labels: big-data

dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Stars: ✭ 39 (-66.09%)

Mutual labels: big-data

siembol

An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.

Stars: ✭ 153 (+33.04%)

Mutual labels: big-data

Asakusafw

Asakusa Framework

Stars: ✭ 114 (-0.87%)

Mutual labels: big-data

Treeviz

Tree diagrams with JavaScript 🌲 📈

Stars: ✭ 95 (-17.39%)

Mutual labels: big-data

phoenix-queryserver

Apache Phoenix Query Server

Stars: ✭ 33 (-71.3%)

Mutual labels: big-data

learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Stars: ✭ 146 (+26.96%)

Mutual labels: apache-spark

Just Dashboard

📊 📋 Dashboards using YAML or JSON files

Stars: ✭ 1,511 (+1213.91%)

Mutual labels: big-data

Smart Array To Tree

Convert large amounts of data array to tree fastly

Stars: ✭ 91 (-20.87%)

Mutual labels: big-data

streamsx.kafka

Repository for integration with Apache Kafka

Stars: ✭ 13 (-88.7%)

Mutual labels: apache-spark

Dataengineeringproject

Example end to end data engineering project.

Stars: ✭ 82 (-28.7%)

Mutual labels: big-data

airavata-php-gateway

Mirror of Apache Airavata PHP Gateway

Stars: ✭ 15 (-86.96%)

Mutual labels: big-data

Uproot4

ROOT I/O in pure Python and NumPy.

Stars: ✭ 80 (-30.43%)

Mutual labels: big-data

Pythondata

repo for code published on pythondata.com

Stars: ✭ 113 (-1.74%)

Mutual labels: big-data

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-31.3%)

Mutual labels: big-data

net.jgp.books.spark.ch01

Spark in Action, 2nd edition - chapter 1 - Introduction

Stars: ✭ 72 (-37.39%)

Mutual labels: apache-spark

Spark Website

Apache Spark Website

Stars: ✭ 75 (-34.78%)

Mutual labels: big-data

Location-based-Restaurants-Recommendation-System

Big Data Management and Analysis Final Project

Stars: ✭ 44 (-61.74%)

Mutual labels: apache-spark

Bookkeeper

Apache Bookkeeper

Stars: ✭ 1,178 (+924.35%)

Mutual labels: big-data

azure-big-data-starter

A boilerplate project for Azure Big Data PaaS services

Stars: ✭ 13 (-88.7%)

Mutual labels: big-data

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-38.26%)

Mutual labels: big-data

soda-spark

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Stars: ✭ 58 (-49.57%)

Mutual labels: pyspark

Countly Sdk Cordova

Countly Product Analytics SDK for Cordova, Icenium and Phonegap

Stars: ✭ 69 (-40%)

Mutual labels: big-data

spark-utils

Basic framework utilities to quickly start writing production ready Apache Spark applications

Stars: ✭ 25 (-78.26%)

Mutual labels: apache-spark

Hazelcast Cpp Client

Hazelcast IMDG C++ Client

Stars: ✭ 67 (-41.74%)

Mutual labels: big-data

nebula

A distributed block-based data storage and compute engine

Stars: ✭ 127 (+10.43%)

Mutual labels: big-data

proxima-platform

The Proxima platform.

Stars: ✭ 17 (-85.22%)

Mutual labels: apache-spark

Ambari

Mirror of Apache Ambari

Stars: ✭ 1,576 (+1270.43%)

Mutual labels: big-data

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-43.48%)

Mutual labels: big-data

beam-site

Apache Beam Site

Stars: ✭ 28 (-75.65%)

Mutual labels: big-data

Spark Doc Zh

Apache Spark 官方文档中文版

Stars: ✭ 1,126 (+879.13%)

Mutual labels: big-data

Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

Stars: ✭ 126 (+9.57%)

Mutual labels: big-data

DataEngineering

This repo contains commands that data engineers use in day to day work.

Stars: ✭ 47 (-59.13%)

Mutual labels: pyspark

Attic Lens

Mirror of Apache Lens

Stars: ✭ 58 (-49.57%)

Mutual labels: big-data

predictionio-sdk-python

PredictionIO Python SDK

Stars: ✭ 199 (+73.04%)

Mutual labels: big-data

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-50.43%)

Mutual labels: big-data

ceja

PySpark phonetic and string matching algorithms

Stars: ✭ 24 (-79.13%)

Mutual labels: pyspark

Lifion Kinesis

A native Node.js producer and consumer library for Amazon Kinesis Data Streams

Stars: ✭ 54 (-53.04%)

Mutual labels: big-data

fink-broker

Astronomy Broker based on Apache Spark

Stars: ✭ 18 (-84.35%)

Mutual labels: apache-spark

Oodt

Mirror of Apache OODT

Stars: ✭ 52 (-54.78%)

Mutual labels: big-data

Trck

Query engine for TrailDB

Stars: ✭ 48 (-58.26%)

Mutual labels: big-data

bullet-core

Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.

Stars: ✭ 36 (-68.7%)

Mutual labels: big-data

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (+791.3%)

Mutual labels: big-data

scarf

Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.

Stars: ✭ 54 (-53.04%)

Mutual labels: big-data

Genie

Distributed Big Data Orchestration Service

Stars: ✭ 1,544 (+1242.61%)

Mutual labels: big-data

big-data-engineering-indonesia

A curated list of big data engineering tools, resources and communities.

Stars: ✭ 26 (-77.39%)

Mutual labels: big-data

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

Stars: ✭ 110 (-4.35%)

Mutual labels: big-data

spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/

Stars: ✭ 609 (+429.57%)

Mutual labels: apache-spark

beekeeper

Service for automatically managing and cleaning up unreferenced data

Stars: ✭ 43 (-62.61%)

Mutual labels: big-data

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 109 (-5.22%)

Mutual labels: big-data

Attic Predictionio Sdk Java

PredictionIO Java SDK