Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-31.43%)

Mutual labels: hadoop

flokkr

Documentation placeholder and utilities for all the other containers.

Stars: ✭ 30 (-14.29%)

Mutual labels: hadoop

implyr

SQL backend to dplyr for Impala

Stars: ✭ 74 (+111.43%)

Mutual labels: hadoop

aaocp

一个对用户行为日志进行分析的大数据项目

Stars: ✭ 53 (+51.43%)

Mutual labels: hadoop

MLHadoop

This repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.

Stars: ✭ 50 (+42.86%)

Mutual labels: hadoop

presto

Teradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data

Stars: ✭ 91 (+160%)

Mutual labels: hadoop

docker-hadoop-3

Docker file for Hadoop 3

Stars: ✭ 19 (-45.71%)

Mutual labels: hadoop

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (+34.29%)

Mutual labels: hadoop

fsbrowser

Fast desktop client for Hadoop Distributed File System

Stars: ✭ 27 (-22.86%)

Mutual labels: hadoop

hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Stars: ✭ 31 (-11.43%)

Mutual labels: hadoop

darwin

Avro Schema Evolution made easy

Stars: ✭ 26 (-25.71%)

Mutual labels: hadoop

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-60%)

Mutual labels: hadoop

hadoop-crypto

Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.

Stars: ✭ 38 (+8.57%)

Mutual labels: hadoop

clusterdock

clusterdock is a framework for creating Docker-based container clusters

Stars: ✭ 26 (-25.71%)

Mutual labels: hadoop

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-45.71%)

Mutual labels: hadoop

clickhouse hadoop

Import data from clickhouse to hadoop with pure SQL

Stars: ✭ 26 (-25.71%)

Mutual labels: hadoop

ros hadoop

Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.

Stars: ✭ 92 (+162.86%)

Mutual labels: hadoop

cobra-policytool

Manage Apache Atlas and Ranger configuration for your Hadoop environment.

Stars: ✭ 16 (-54.29%)

Mutual labels: hadoop

hadoop-deployment-bash

Code for the deployment of Hadoop clusters, written in Bourne or Bourne Again shell.

Stars: ✭ 31 (-11.43%)

Mutual labels: hadoop

View All Similar Projects ➔

Platform Stack: `modern-data-platform` - v1.14.0

This Platform Stack defines the set of services for a Modern Data Platform, such as

Kafka
Spark
Hadoop Ecosystem
StreamSets & NiFi
Zeppelin & Jupyter
NoSQL

and many others.

Which services can I use?

The following services are provisioned as part of the Modern Data Platform:

For new services to be added, please either create an GitHub issue or create a Pull Request.

Changes

See What's new? for a detailed list of changes.

Documentation

Getting Started with platys and modern-data-platform stack
Configuration - all settings configurable in the config.yml
Cookbooks - various recipes showing how to use specific features of platys
Port Mapping
Frequently Asked Questions
Troubleshooting
Adding additional services not supported by a platform stack
How to use a platys-generated stack without Internet
Upgrade to a new platform stack version

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

TrivadisPF / platys-modern-data-platform

Programming Languages

Labels

Projects that are alternatives of or similar to platys-modern-data-platform

Platform Stack: `modern-data-platform` - v1.14.0

Which services can I use?

Changes

Documentation

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

TrivadisPF / platys-modern-data-platform

Programming Languages

Labels

Projects that are alternatives of or similar to platys-modern-data-platform

Platform Stack: modern-data-platform - v1.14.0

Which services can I use?

Changes

Documentation

Platform Stack: `modern-data-platform` - v1.14.0