Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Stars: ✭ 115 (+167.44%)

Mutual labels: big-data, s3

qwery

A SQL-like language for performing ETL transformations.

Stars: ✭ 28 (-34.88%)

Mutual labels: hive, s3

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+225.58%)

Mutual labels: big-data, hive

Docker Registry Pruner

Tool to apply retention logic to docker images in a Docker Registry

Stars: ✭ 122 (+183.72%)

Mutual labels: maintenance, cleanup

Ozone

Scalable, redundant, and distributed object store for Apache Hadoop

Stars: ✭ 330 (+667.44%)

Mutual labels: big-data, s3

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+30032.56%)

Mutual labels: big-data, hive

Docker Registry Manifest Cleanup

Cleans up docker registry by removing untagged manifests from the registry

Stars: ✭ 127 (+195.35%)

Mutual labels: s3, cleanup

apiary

Apiary provides modules which can be combined to create a federated cloud data lake

Stars: ✭ 30 (-30.23%)

Mutual labels: hive, hive-metastore

Dataengineeringproject

Example end to end data engineering project.

Stars: ✭ 82 (+90.7%)

Mutual labels: big-data, s3

Helicalinsight

Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.

Stars: ✭ 214 (+397.67%)

Mutual labels: big-data, hive

nifi

Deploy a secured, clustered, auto-scaling NiFi service in AWS.

Stars: ✭ 37 (-13.95%)

Mutual labels: big-data, s3

spark-records

Bulletproof Apache Spark jobs with fast root cause analysis of failures.

Stars: ✭ 67 (+55.81%)

Mutual labels: big-data

terraform-aws-sftp

This terraform module is used to create sftp on AWS for S3.

Stars: ✭ 20 (-53.49%)

Mutual labels: s3

common-datax

基于DataX的通用数据同步微服务，一个Restful接口搞定所有通用数据同步

Stars: ✭ 51 (+18.6%)

Mutual labels: hive

hiveql-parser

HiveQL Parser. Parse HiveQL code and print AST in JSON format if success, else print well formed syntax error message.

Stars: ✭ 25 (-41.86%)

Mutual labels: hive

azure-big-data-starter

A boilerplate project for Azure Big Data PaaS services

Stars: ✭ 13 (-69.77%)

Mutual labels: big-data

go-localstack

Go Wrapper for using localstack

Stars: ✭ 56 (+30.23%)

Mutual labels: s3

silly-android

Android plugins for Java, making core Android APIs easy to use

Stars: ✭ 40 (-6.98%)

Mutual labels: cleanup

commentator

A simple commenting system for your blog.

Stars: ✭ 29 (-32.56%)

Mutual labels: s3

CS Book

🔥 Latest computer science e-books。提供最新技术类电子书下载， “我无非就是想卷死各位，或者被各位卷死！”

Stars: ✭ 40 (-6.98%)

Mutual labels: big-data

scarf

Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.

Stars: ✭ 54 (+25.58%)

Mutual labels: big-data

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

Stars: ✭ 21 (-51.16%)

Mutual labels: hive

databricks-dbapi

DBAPI and SQLAlchemy dialect for Databricks Workspace and SQL Analytics clusters

Stars: ✭ 21 (-51.16%)

Mutual labels: hive

mining-camp

Easy automated configuration and deployment of Minecraft servers on AWS spot instances, featuring automatic backups and restoration using S3.

Stars: ✭ 43 (+0%)

Mutual labels: s3

simple-ddl-parser

Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.

Stars: ✭ 76 (+76.74%)

Mutual labels: hive

mlflow-docker

Ready to run docker-compose configuration for ML Flow with Mysql and Minio S3

Stars: ✭ 146 (+239.53%)

Mutual labels: s3

RemoteShuffleService

Celeborn provides an elastic and high-performance service for shuffle and spilled data.

Stars: ✭ 262 (+509.3%)

Mutual labels: big-data

terraform-modules

Terraform Modules by Peak

Stars: ✭ 16 (-62.79%)

Mutual labels: s3

IoT-system-PLC-data-to-InfluxDB

This project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.

Stars: ✭ 26 (-39.53%)

Mutual labels: big-data

awesome-hive

A curated list of awesome Hive resources.

Stars: ✭ 20 (-53.49%)

Mutual labels: hive

react-relay-appsync

AppSync for Relay

Stars: ✭ 19 (-55.81%)

Mutual labels: s3

datajoint-python

Relational data pipelines for the science lab