最好的大数据项目。《Titan数据运营系统》，本项目是一个全栈闭环系统，我们有用作数据可视化的web系统，然后用flume-kafaka-flume进行日志的读取，在hive设计数仓，编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移，使用azkaban进行定时任务的调度，使用技术：Java/Scala语言，Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot，Bootstrap， Echart等；

Stars: ✭ 62 (-60%)

Mutual labels: hadoop

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (+446.45%)

Mutual labels: hadoop

my-awesome-projects

Learn by doing projects

Stars: ✭ 48 (-69.03%)

Mutual labels: hadoop

Wifi

基于wifi抓取信息的大数据查询分析系统

Stars: ✭ 93 (-40%)

Mutual labels: hadoop

cloud

云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件

Stars: ✭ 48 (-69.03%)

Mutual labels: hadoop

Stormtweetssentimentd3viz

Computes and visualizes the sentiment analysis of tweets of US States in real-time using Storm.

Stars: ✭ 25 (-83.87%)

Mutual labels: hadoop

cmux

A set of commands for managing CDH clusters using Cloudera Manager REST API.

Stars: ✭ 34 (-78.06%)

Mutual labels: hadoop

Hbaseclient

HBase客户端数据管理软件

Stars: ✭ 135 (-12.9%)

Mutual labels: hadoop

platys-modern-data-platform

Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....

Stars: ✭ 35 (-77.42%)

Mutual labels: hadoop

Floating Elephants

Docker containers for Hadoop.

Stars: ✭ 19 (-87.74%)

Mutual labels: hadoop

docker-hadoop-3

Docker file for Hadoop 3

Stars: ✭ 19 (-87.74%)

Mutual labels: hadoop

Hadoop Mapreduce

Mirror of Apache Hadoop MapReduce

Stars: ✭ 88 (-43.23%)

Mutual labels: hadoop

clusterdock

clusterdock is a framework for creating Docker-based container clusters

Stars: ✭ 26 (-83.23%)

Mutual labels: hadoop

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-96.77%)

Mutual labels: hadoop

fsbrowser

Fast desktop client for Hadoop Distributed File System

Stars: ✭ 27 (-82.58%)

Mutual labels: hadoop

Hdfs Shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Stars: ✭ 117 (-24.52%)

Mutual labels: hadoop

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-90.97%)

Mutual labels: hadoop

Winutils

winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows

Stars: ✭ 657 (+323.87%)

Mutual labels: hadoop

clickhouse hadoop

Import data from clickhouse to hadoop with pure SQL

Stars: ✭ 26 (-83.23%)

Mutual labels: hadoop

Docker Hadoop Cluster

Multiple node cluster on Docker for self development.

Stars: ✭ 82 (-47.1%)

Mutual labels: hadoop

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-69.68%)

Mutual labels: hadoop

Tony

TonY is a framework to natively run deep learning frameworks on Apache Hadoop.

Stars: ✭ 626 (+303.87%)

Mutual labels: hadoop

darwin

Avro Schema Evolution made easy

Stars: ✭ 26 (-83.23%)

Mutual labels: hadoop

Hadoop

Apache Hadoop

Stars: ✭ 12,177 (+7756.13%)

Mutual labels: hadoop

hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Stars: ✭ 31 (-80%)

Mutual labels: hadoop

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+3549.03%)

Mutual labels: hadoop

hadoop-crypto

Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.

Stars: ✭ 38 (-75.48%)

Mutual labels: hadoop

Learn machine learning

Road to Machine Learning

Stars: ✭ 81 (-47.74%)

Mutual labels: hadoop

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-87.74%)

Mutual labels: hadoop

Alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Stars: ✭ 5,379 (+3370.32%)

Mutual labels: hadoop

hadoop-ecosystem

Visualizations of the Hadoop Ecosystem

Stars: ✭ 20 (-87.1%)

Mutual labels: hadoop

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (+951.61%)

Mutual labels: hadoop

hadoop-etl-udfs

The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL

Stars: ✭ 17 (-89.03%)

Mutual labels: hadoop

Bigdata

💎🔥大数据学习笔记

Stars: ✭ 488 (+214.84%)

Mutual labels: hadoop

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (-79.35%)

Mutual labels: hadoop

Chukwa

Mirror of Apache Chukwa

Stars: ✭ 77 (-50.32%)

Mutual labels: hadoop

rastercube

rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)

Stars: ✭ 15 (-90.32%)

Mutual labels: hadoop

School Of Sre

At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.

Stars: ✭ 5,141 (+3216.77%)

Mutual labels: hadoop

oci-cloudera

Terraform module to deploy Cloudera on Oracle Cloud Infrastructure (OCI)

Stars: ✭ 20 (-87.1%)

Mutual labels: hadoop

Calcite Avatica

Mirror of Apache Calcite - Avatica

Stars: ✭ 130 (-16.13%)

Mutual labels: hadoop

skein

A tool and library for easily deploying applications on Apache YARN

Stars: ✭ 128 (-17.42%)

Mutual labels: hadoop

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+14124.52%)

Mutual labels: hadoop

disq

A library for manipulating bioinformatics sequencing formats in Apache Spark

Stars: ✭ 29 (-81.29%)

Mutual labels: hadoop

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+670.97%)

Mutual labels: hadoop

Marmaray

Generic Data Ingestion & Dispersal Library for Hadoop

Stars: ✭ 414 (+167.1%)

Mutual labels: hadoop

Movie recommend

基于Spark的电影推荐系统，包含爬虫项目、web网站、后台管理系统以及spark推荐系统

Stars: ✭ 2,092 (+1249.68%)

Mutual labels: hadoop

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples