All Projects → flokkr → docker-hadoop

flokkr / docker-hadoop

Licence: Apache-2.0 license
Docker image for main Apache Hadoop components (Yarn/Hdfs)

Programming Languages

shell
77523 projects
Dockerfile
14818 projects
go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to docker-hadoop

fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-66.1%)
Mutual labels:  yarn, hadoop, hdfs
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+1352.54%)
Mutual labels:  yarn, hadoop, hdfs
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-67.8%)
Mutual labels:  yarn, hadoop, hdfs
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+18528.81%)
Mutual labels:  yarn, hadoop, hdfs
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (+172.88%)
Mutual labels:  hadoop, hdfs
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+154.24%)
Mutual labels:  hadoop, hdfs
beanszoo
Distributed Java micro-services using ZooKeeper
Stars: ✭ 12 (-79.66%)
Mutual labels:  yarn, hadoop
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+55.93%)
Mutual labels:  hadoop, hdfs
yarn-prometheus-exporter
Export Hadoop YARN (resource-manager) metrics in prometheus format
Stars: ✭ 44 (-25.42%)
Mutual labels:  yarn, hadoop
knit
Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Stars: ✭ 53 (-10.17%)
Mutual labels:  yarn, hadoop
Tf Yarn
Train TensorFlow models on YARN in just a few lines of code!
Stars: ✭ 76 (+28.81%)
Mutual labels:  yarn, hadoop
Dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Stars: ✭ 122 (+106.78%)
Mutual labels:  hadoop, hdfs
Hdfs Shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (+98.31%)
Mutual labels:  hadoop, hdfs
Ibis
A pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+2662.71%)
Mutual labels:  hadoop, hdfs
Jumbune
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Stars: ✭ 64 (+8.47%)
Mutual labels:  yarn, hadoop
Hadoop Yarn Api Python Client
Python client for Hadoop® YARN API
Stars: ✭ 91 (+54.24%)
Mutual labels:  yarn, hadoop
Camus
Mirror of Linkedin's Camus
Stars: ✭ 81 (+37.29%)
Mutual labels:  hadoop, hdfs
Wifi
基于wifi抓取信息的大数据查询分析系统
Stars: ✭ 93 (+57.63%)
Mutual labels:  hadoop, hdfs
Akkeeper
An easy way to deploy your Akka services to a distributed environment.
Stars: ✭ 30 (-49.15%)
Mutual labels:  yarn, hadoop
Tensorflowonyarn
Support TensorFlow on YARN
Stars: ✭ 114 (+93.22%)
Mutual labels:  yarn, hadoop

Apache Hadoop docker images

These images are part of the Bigdata docker image series. All of the images use the same base docker image which contains plugin scripts to launch different project in containerized environments.

For more detailed instruction about the available environment variables see the README in the flokkr/docker-baseimage repository.

Docker images are tested with Kubernetes

Getting started with Kubernetes

The easiest way to start is to do a kubectl apply -f . from the ./exmaples directories (Using ephemeral storage!)

For more specific use case it's recommended to use flekszible. The resource definitions can be found in this repository (./hadoop,./hdfs,./yarn...)

Getting started with Flekszible

Install Flekszible (download binary and put it to the path)

  1. Create a working dir
cd /tmp
mkdir cluster
cd cluster
  1. Add this repository as a source
flekszible source add github.com/flokkr/docker-hadoop
  1. Choose and add required services:
flekszible app add hdfs
  1. Generate Kubernetes resource files
flekszible generate 
  1. Lunch the rockets:
kubectl apply -f .

Additional Flekszible options

You can list available apps (after source import):

flekszible app search
+---------+-------------------------------+
| path    | description                   |
+---------+-------------------------------+
| hdfs    | Apache Hadoop HDFS base setup |
| hdfs-ha | Apache Hadoop HDFS, HA setup  |
...

The base setup can be modified with additional transformatios:

flekszible definitions search | grep hdfs
...
| hdfs/persistence    | Add real PVC based persistence                                                             |
| hdfs/onenode        | remove scheduling rules to make it possible to run multiple datanode on the same k8s node. |
...

You can apply transformations with modifing the Flekszible descriptor file:

Original version:

source:
- url: github.com/flokkr/docker-hadoop
import:
- path: hdfs

Modified:

source:
- url: github.com/flokkr/docker-hadoop
import:
- path: hdfs
  transformations:
  - type: hdfs/onenode
  - type: image
    image: flokkr/hadoop:3.2.0
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].