All Projects → Lewuathe → Docker Hadoop Cluster

Lewuathe / Docker Hadoop Cluster

Licence: apache-2.0
Multiple node cluster on Docker for self development.

Programming Languages

shell
77523 projects

Projects that are alternatives of or similar to Docker Hadoop Cluster

Moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+1150%)
Mutual labels:  hadoop
Src
A light-weight distributed stream computing framework for Golang
Stars: ✭ 67 (-18.29%)
Mutual labels:  hadoop
Tf Yarn
Train TensorFlow models on YARN in just a few lines of code!
Stars: ✭ 76 (-7.32%)
Mutual labels:  hadoop
Hadoop Solr
Code to index HDFS to Solr using MapReduce
Stars: ✭ 51 (-37.8%)
Mutual labels:  hadoop
Waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-26.83%)
Mutual labels:  hadoop
Hive Funnel Udf
Hive UDFs for funnel analysis
Stars: ✭ 72 (-12.2%)
Mutual labels:  hadoop
Weblogsanalysissystem
A big data platform for analyzing web access logs
Stars: ✭ 37 (-54.88%)
Mutual labels:  hadoop
Learn machine learning
Road to Machine Learning
Stars: ✭ 81 (-1.22%)
Mutual labels:  hadoop
Jumbune
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Stars: ✭ 64 (-21.95%)
Mutual labels:  hadoop
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+1357.32%)
Mutual labels:  hadoop
Docker Hadoop
A Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-34.15%)
Mutual labels:  hadoop
Likelike
An implementation of locality sensitive hashing with Hadoop
Stars: ✭ 58 (-29.27%)
Mutual labels:  hadoop
Apache Spark Hands On
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-9.76%)
Mutual labels:  hadoop
Base
https://www.researchgate.net/profile/Rajah_Iyer
Stars: ✭ 48 (-41.46%)
Mutual labels:  hadoop
Chukwa
Mirror of Apache Chukwa
Stars: ✭ 77 (-6.1%)
Mutual labels:  hadoop
Nagios Plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+1119.51%)
Mutual labels:  hadoop
Atsd
Axibase Time Series Database Documentation
Stars: ✭ 68 (-17.07%)
Mutual labels:  hadoop
Camus
Mirror of Linkedin's Camus
Stars: ✭ 81 (-1.22%)
Mutual labels:  hadoop
Docker Spark
🚢 Docker image for Apache Spark
Stars: ✭ 78 (-4.88%)
Mutual labels:  hadoop
Docker Hadoop
Apache Hadoop docker image
Stars: ✭ 1,190 (+1351.22%)
Mutual labels:  hadoop

docker-hadoop-cluster Build Status

Multiple node cluster on Docker for self development. docker-hadoop-cluster is suitable for testing your patch for Hadoop which has multiple nodes.

Build images from your Hadoop source code

Base image of hadoop service. This image includes JDK, hadoop package configurations etc. This image can include your self-build hadoop package. In order to bind, tar.gz package assumed be put under hadoop-base directory.

$ cd docker-hadoop-cluster
$ cp /path/to/hadoop-3.0.0-alpha3-SNAPSHOT.tar.gz hadoop-base
$ make

Once you build hadoop-base image, you can launch hadoop cluster by using docker-compose.

$ docker-compose up -d

or

$ make run

See http://localhost:9870 for NameNode or http://localhost:8088 for ResourceManager.

$ make down

shutdowns your cluster.

Build images from the latest trunk

docker-hadoop-cluster also uploads the latest image which refers HEAD of trunk. They are deployed on Docker Hub. If you want to try the trunk (though it can be unstable), docker-compose.yml like below is needed. It will launch 3 slave Hadoop cluster.

version: '2'

services:
  master:
    image: lewuathe/hadoop-master
    ports:
      - "9870:9870"
      - "8088:8088"
      - "19888:19888"
      - "8188:8188"
    container_name: "master"
  slave1:
    image: lewuathe/hadoop-slave
    container_name: "slave1"
    depends_on:
      - master
    ports:
      - "9901:9864"
      - "8041:8042"
  slave2:
    image: lewuathe/hadoop-slave
    container_name: "slave2"
    depends_on:
      - master
    ports:
      - "9902:9864"
      - "8042:8042"
  slave3:
    image: lewuathe/hadoop-slave
    container_name: "slave3"
    depends_on:
      - master
    ports:
      - "9903:9864"
      - "8043:8042"

Login cluster

$ docker exec -it master bash
bash-4.1# cd /usr/local/hadoop
bash-4.1# bin/hadoop version
Hadoop 3.0.0-SNAPSHOT
Source code repository git://git.apache.org/hadoop.git -r 0c7d3f480548745e9e9ccad1d318371c020c3003
Compiled by lewuathe on 2015-09-13T01:12Z
Compiled with protoc 2.5.0
From source with checksum 9174a352ac823cdfa576f525665e99
This command was run using /usr/local/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/hadoop-common-3.0.0-SNAPSHOT.jar

Deploy on EC2

We can run docker-hadoop-cluster on EC2 instance with ec2/ script.

$ python ec2/ec2.py -k <Key Name> -s <Security Group ID> -n <Subnet ID> launch

This script launch EC2 instance and prepare prerequisites to launch docker-hadoop-cluster.

Docker Hub

Image name Pulls Stars
lewuathe/hadoop-base hadoop-base hadoop-base
lewuathe/hadoop-master hadoop-master hadoop-master
lewuathe/hadoop-slave hadoop-slave hadoop-slave

License

Apache License Version2.0 This images are modified version of sequenceiq/hadoop-docker.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].