Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (-17.14%)

Mutual labels: hadoop

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-31.43%)

Mutual labels: hadoop

learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Stars: ✭ 146 (+317.14%)

Mutual labels: hadoop

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (+60%)

Mutual labels: hadoop

xxhadoop

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Stars: ✭ 37 (+5.71%)

Mutual labels: hadoop

teraslice

Scalable data processing pipelines in JavaScript

Stars: ✭ 48 (+37.14%)

Mutual labels: hadoop

implyr

SQL backend to dplyr for Impala

Stars: ✭ 74 (+111.43%)

Mutual labels: hadoop

pyspark-ML-in-Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model

Stars: ✭ 32 (-8.57%)

Mutual labels: hadoop

MLHadoop

This repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.

Stars: ✭ 50 (+42.86%)

Mutual labels: hadoop

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+11.43%)

Mutual labels: hadoop

presto

Teradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data

Stars: ✭ 91 (+160%)

Mutual labels: hadoop

the-apache-ignite-book

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

Stars: ✭ 65 (+85.71%)

Mutual labels: hadoop

flokkr

Documentation placeholder and utilities for all the other containers.

Stars: ✭ 30 (-14.29%)

Mutual labels: hadoop

HDFS-Netdisc

基于Hadoop的分布式云存储系统 🌴

Stars: ✭ 56 (+60%)

Mutual labels: hadoop

memex-gate

General Architecture for Text Engineering

Stars: ✭ 47 (+34.29%)

Mutual labels: hadoop

openPDC

Open Source Phasor Data Concentrator

Stars: ✭ 109 (+211.43%)

Mutual labels: hadoop

aaocp

一个对用户行为日志进行分析的大数据项目

Stars: ✭ 53 (+51.43%)

Mutual labels: hadoop

yarn-prometheus-exporter

Export Hadoop YARN (resource-manager) metrics in prometheus format

Stars: ✭ 44 (+25.71%)

Mutual labels: hadoop

learning-spark

Tidy up Spark and Hadoop tutorials.

Stars: ✭ 28 (-20%)

Mutual labels: hadoop

skein

A tool and library for easily deploying applications on Apache YARN

Stars: ✭ 128 (+265.71%)

Mutual labels: hadoop

JavaFramework

Simple Java Framework,designed for easily develop Spring based java program.Support Bigdata And metadata management.A common elasticsearch comm query tool and so on.

Stars: ✭ 16 (-54.29%)

Mutual labels: hadoop

hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Stars: ✭ 31 (-11.43%)

Mutual labels: hadoop

disq

A library for manipulating bioinformatics sequencing formats in Apache Spark

Stars: ✭ 29 (-17.14%)

Mutual labels: hadoop

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-60%)

Mutual labels: hadoop

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

Stars: ✭ 21 (-40%)

Mutual labels: hadoop

hadoop-crypto

Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.

Stars: ✭ 38 (+8.57%)

Mutual labels: hadoop

disk

基于hadoop+hbase+springboot实现分布式网盘系统

Stars: ✭ 53 (+51.43%)

Mutual labels: hadoop

clusterdock

clusterdock is a framework for creating Docker-based container clusters

Stars: ✭ 26 (-25.71%)

Mutual labels: hadoop

LogAnalyzeHelper

论坛日志分析系统清洗程序(包含IP规则库，UDF开发，MapReduce程序，日志数据)

Stars: ✭ 33 (-5.71%)

Mutual labels: hadoop

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-45.71%)

Mutual labels: hadoop

qs-hadoop

大数据生态圈学习

Stars: ✭ 18 (-48.57%)

Mutual labels: hadoop

clickhouse hadoop

Import data from clickhouse to hadoop with pure SQL

Stars: ✭ 26 (-25.71%)

Mutual labels: hadoop

Data-pipeline-project

Data pipeline project