WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

Stars: ✭ 372 (+238.18%)

Mutual labels: hadoop, hive

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (+257.27%)

Mutual labels: hadoop, avro

Docs4dev

后端开发常用框架文档及中文翻译，包含 Spring 系列文档（Spring, Spring Boot, Spring Cloud, Spring Security, Spring Session），大数据（Apache Hive, HBase, Apache Flume），日志（Log4j2, Logback），Http Server（NGINX，Apache），Python，数据库（OpenTSDB，MySQL，PostgreSQL）等最新官方文档以及对应的中文翻译。

Stars: ✭ 974 (+785.45%)

Mutual labels: hive

Camus

Mirror of Linkedin's Camus

Stars: ✭ 81 (-26.36%)

Mutual labels: hadoop

Pyetl

python ETL framework

Stars: ✭ 33 (-70%)

Mutual labels: hive

Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

Stars: ✭ 97 (-11.82%)

Mutual labels: avro

Learn machine learning

Road to Machine Learning

Stars: ✭ 81 (-26.36%)

Mutual labels: hadoop

Akkeeper

An easy way to deploy your Akka services to a distributed environment.

Stars: ✭ 30 (-72.73%)

Mutual labels: hadoop

Storm Camel Example

Real-time analysis and visualization with Storm-AMQ-Camel-Websockets-Highcharts integration.

Stars: ✭ 28 (-74.55%)

Mutual labels: hadoop

Docker Spark

🚢 Docker image for Apache Spark

Stars: ✭ 78 (-29.09%)

Mutual labels: hadoop

Interview Questions Collection

按知识领域整理面试题，包括C++、Java、Hadoop、机器学习等

Stars: ✭ 21 (-80.91%)

Mutual labels: hadoop

Avrocado

Avrocado is a convenience library to handle Avro in Golang

Stars: ✭ 21 (-80.91%)

Mutual labels: avro

Dampr

Python Data Processing library

Stars: ✭ 102 (-7.27%)

Mutual labels: mapreduce

Bitalarm

An app to keep track of different cryptocurrencies, written in dart + flutter

Stars: ✭ 94 (-14.55%)

Mutual labels: hive

Chukwa

Mirror of Apache Chukwa

Stars: ✭ 77 (-30%)

Mutual labels: hadoop

Cdc Kafka Hadoop

MySQL to NoSQL real time dataflow

Stars: ✭ 13 (-88.18%)

Mutual labels: hadoop

Tf Yarn

Train TensorFlow models on YARN in just a few lines of code!

Stars: ✭ 76 (-30.91%)

Mutual labels: hadoop

Mare

MaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.

Stars: ✭ 11 (-90%)

Mutual labels: mapreduce

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (+670%)

Mutual labels: hadoop

Hadoop Pot

A scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015.

Stars: ✭ 8 (-92.73%)

Mutual labels: hadoop

Databook

A facebook for data

Stars: ✭ 26 (-76.36%)

Mutual labels: hive

Docker Hadoop

Apache Hadoop docker image

Stars: ✭ 1,190 (+981.82%)

Mutual labels: hadoop

Avsc

Avro for JavaScript ⚡️

Stars: ✭ 930 (+745.45%)

Mutual labels: avro

Pyhive

Python interface to Hive and Presto. 🐝

Stars: ✭ 1,378 (+1152.73%)

Mutual labels: hive

Stormtweetssentimentd3viz

Computes and visualizes the sentiment analysis of tweets of US States in real-time using Storm.

Stars: ✭ 25 (-77.27%)

Mutual labels: hadoop

Coursera Uw Machine Learning Clustering Retrieval

Stars: ✭ 25 (-77.27%)

Mutual labels: mapreduce

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+744.55%)

Mutual labels: mapreduce

Kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Stars: ✭ 916 (+732.73%)

Mutual labels: hadoop

Floating Elephants

Docker containers for Hadoop.

Stars: ✭ 19 (-82.73%)

Mutual labels: hadoop

Magnolify

A collection of Magnolia add-on modules

Stars: ✭ 81 (-26.36%)

Mutual labels: avro

Luigi Warehouse

A luigi powered analytics / warehouse stack

Stars: ✭ 72 (-34.55%)

Mutual labels: hive

Aptos

☀️ Avro, Protobuf, Thrift on Swagger

Stars: ✭ 17 (-84.55%)

Mutual labels: avro

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-35.45%)

Mutual labels: mapreduce

Yandex Big Data Engineering

Stars: ✭ 17 (-84.55%)

Mutual labels: mapreduce

Maha

A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.

Stars: ✭ 101 (-8.18%)

Mutual labels: hive

Hadoop Yarn Api Python Client

Python client for Hadoop® YARN API

Stars: ✭ 91 (-17.27%)

Mutual labels: hadoop

Atsd

Axibase Time Series Database Documentation

Stars: ✭ 68 (-38.18%)

Mutual labels: hadoop

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-95.45%)

Mutual labels: hadoop

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (+561.82%)

Mutual labels: avro

Eyerissf

An Eyeriss Chip (researched by MIT, a CNN accelerator) simulator and New DNN framework "Hive"

Stars: ✭ 68 (-38.18%)

Mutual labels: hive

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+532.73%)

Mutual labels: hive

Mapreduce

MapReduce by examples

Stars: ✭ 91 (-17.27%)

Mutual labels: mapreduce

Pmacct

pmacct is a small set of multi-purpose passive network monitoring tools [NetFlow IPFIX sFlow libpcap BGP BMP RPKI IGP Streaming Telemetry].

Stars: ✭ 677 (+515.45%)

Mutual labels: avro

Winutils

winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows

Stars: ✭ 657 (+497.27%)

Mutual labels: hadoop

Distributed Computing

distributed_computing include mapreduce kvstore etc.

Stars: ✭ 654 (+494.55%)

Mutual labels: mapreduce

Jumbune

Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,

Stars: ✭ 64 (-41.82%)

Mutual labels: hadoop

Corral

🐎 A serverless MapReduce framework written for AWS Lambda

Stars: ✭ 648 (+489.09%)

Mutual labels: mapreduce

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (+486.36%)

Mutual labels: hadoop

Schema Registry

Confluent Schema Registry for Kafka