WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-38.71%)

Mutual labels: hadoop

teraslice

Scalable data processing pipelines in JavaScript

Stars: ✭ 48 (+54.84%)

Mutual labels: hadoop

disq

A library for manipulating bioinformatics sequencing formats in Apache Spark

Stars: ✭ 29 (-6.45%)

Mutual labels: hadoop

beanszoo

Distributed Java micro-services using ZooKeeper

Stars: ✭ 12 (-61.29%)

Mutual labels: hadoop

hadoop-ansible

Install hadoop cluster with ansible

Stars: ✭ 35 (+12.9%)

Mutual labels: hadoop

big-data-exploration

[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product

Stars: ✭ 43 (+38.71%)

Mutual labels: hadoop

liquibase-impala

Liquibase extension to add Impala Database support

Stars: ✭ 23 (-25.81%)

Mutual labels: hadoop

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+25.81%)

Mutual labels: hadoop

UBA

UEBA Solution for Insider Security. This repo is archived. Thanks!

Stars: ✭ 36 (+16.13%)

Mutual labels: hadoop

dockerfiles

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (-6.45%)

Mutual labels: hadoop

memex-gate

General Architecture for Text Engineering

Stars: ✭ 47 (+51.61%)

Mutual labels: hadoop

the-apache-ignite-book

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

Stars: ✭ 65 (+109.68%)

Mutual labels: hadoop

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-22.58%)

Mutual labels: hadoop

apache-airflow-cloudera-parcel

Parcel for Apache Airflow

Stars: ✭ 16 (-48.39%)

Mutual labels: cloudera

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (+80.65%)

Mutual labels: hadoop

hive to es

同步Hive数据仓库数据到Elasticsearch的小工具

Stars: ✭ 21 (-32.26%)

Mutual labels: hadoop

implyr

SQL backend to dplyr for Impala

Stars: ✭ 74 (+138.71%)

Mutual labels: hadoop

smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Stars: ✭ 79 (+154.84%)

Mutual labels: hadoop

learning-spark

Tidy up Spark and Hadoop tutorials.

Stars: ✭ 28 (-9.68%)

Mutual labels: hadoop

MLHadoop

This repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.

Stars: ✭ 50 (+61.29%)

Mutual labels: hadoop

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

Stars: ✭ 21 (-32.26%)

Mutual labels: hadoop

ambari-hdp-docker

Dockerfiles and Docker Compose for HDP 2.6 with Blueprints

Stars: ✭ 23 (-25.81%)

Mutual labels: hadoop

webhdfs

Node.js WebHDFS REST API client

Stars: ✭ 88 (+183.87%)

Mutual labels: hadoop

jmx exporter-cloudera-hadoop

Prometheus jmx_exporter configurations for Cloudera Hadoop

Stars: ✭ 33 (+6.45%)

Mutual labels: hadoop

TonY

TonY is a framework to natively run deep learning frameworks on Apache Hadoop.

Stars: ✭ 687 (+2116.13%)

Mutual labels: hadoop

datasqueeze

Hadoop utility to compact small files

Stars: ✭ 18 (-41.94%)

Mutual labels: hadoop

gomrjob

gomrjob - a Go Framework for Hadoop Map Reduce Jobs

Stars: ✭ 39 (+25.81%)

Mutual labels: hadoop

xxhadoop

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Stars: ✭ 37 (+19.35%)

Mutual labels: hadoop

JavaFramework

Simple Java Framework,designed for easily develop Spring based java program.Support Bigdata And metadata management.A common elasticsearch comm query tool and so on.

Stars: ✭ 16 (-48.39%)

Mutual labels: hadoop

aaocp

一个对用户行为日志进行分析的大数据项目

Stars: ✭ 53 (+70.97%)

Mutual labels: hadoop

orion

Management and automation platform for Stateful Distributed Systems

Stars: ✭ 77 (+148.39%)

Mutual labels: hadoop

corc

An ORC File Scheme for the Cascading data processing platform.

Stars: ✭ 14 (-54.84%)

Mutual labels: hadoop

RecommendationEngine

Source code and dataset for paper "CBMR: An optimized MapReduce for item‐based collaborative filtering recommendation algorithm with empirical analysis"

Stars: ✭ 43 (+38.71%)

Mutual labels: hadoop

presto

Teradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data

Stars: ✭ 91 (+193.55%)

Mutual labels: hadoop

kafka-connect-fs

Kafka Connect FileSystem Connector

Stars: ✭ 107 (+245.16%)

Mutual labels: hadoop

pyspark-ML-in-Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model