All Projects → clusterdock → clusterdock

clusterdock / clusterdock

Licence: Apache-2.0 license
clusterdock is a framework for creating Docker-based container clusters

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to clusterdock

Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+6215.38%)
Mutual labels:  big-data, hadoop
Presto
The official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+49734.62%)
Mutual labels:  big-data, hadoop
Calcite Avatica
Mirror of Apache Calcite - Avatica
Stars: ✭ 130 (+400%)
Mutual labels:  big-data, hadoop
Hdfs Shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (+350%)
Mutual labels:  big-data, hadoop
iis
Information Inference Service of the OpenAIRE system
Stars: ✭ 16 (-38.46%)
Mutual labels:  big-data, hadoop
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (+392.31%)
Mutual labels:  big-data, hadoop
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+476.92%)
Mutual labels:  big-data, hadoop
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (+119.23%)
Mutual labels:  big-data, hadoop
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+726.92%)
Mutual labels:  big-data, hadoop
Calcite
Apache Calcite
Stars: ✭ 2,816 (+10730.77%)
Mutual labels:  big-data, hadoop
Drill
Apache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+6126.92%)
Mutual labels:  big-data, hadoop
rastercube
rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-42.31%)
Mutual labels:  big-data, hadoop
Asakusafw
Asakusa Framework
Stars: ✭ 114 (+338.46%)
Mutual labels:  big-data, hadoop
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (+80.77%)
Mutual labels:  big-data, hadoop
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+42173.08%)
Mutual labels:  big-data, hadoop
Eel Sdk
Big Data Toolkit for the JVM
Stars: ✭ 140 (+438.46%)
Mutual labels:  big-data, hadoop
Hadoop For Geoevent
ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-80.77%)
Mutual labels:  big-data, hadoop
Moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+3842.31%)
Mutual labels:  big-data, hadoop
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+580.77%)
Mutual labels:  big-data, hadoop
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+50%)
Mutual labels:  big-data, hadoop

clusterdock

Documentation Status

clusterdock is a Python 3 project that enables users to build, start, and manage Docker container-based clusters. It uses a pluggable system for defining new types of clusters using folders called topologies and is a swell project, if I may say so myself.


"I hate reading, make this quick."

Before doing anything, install a recent version of Docker to your machine and install clusterdock:

$ pip3 install clusterdock

Next, clone a clusterdock topology to your machine. For this example, we'll use the nodebase topology. You could start a 2-node cluster:

$ git clone https://github.com/clusterdock/topology_nodebase.git
$ clusterdock start topology_nodebase
2017-08-03 10:04:18 PM clusterdock.models   INFO     Starting cluster on network (cluster) ...
2017-08-03 10:04:18 PM clusterdock.models   INFO     Starting node node-1.cluster ...
2017-08-03 10:04:19 PM clusterdock.models   INFO     Starting node node-2.cluster ...
2017-08-03 10:04:20 PM clusterdock.models   INFO     Cluster started successfully (total time: 00:00:01.621).

To list cluster nodes:

$ clusterdock ps

For cluster `famous_hyades` on network cluster the node(s) are:
CONTAINER ID     HOST NAME            PORTS              STATUS        CONTAINER NAME          VERSION    IMAGE
a205d88beb       node-2.cluster                          running       nervous_sinoussi        1.3.3      clusterdock/topology_nodebase:centos6.6
6f2825c596       node-1.cluster       8080->80/tcp       running       priceless_franklin      1.3.3      clusterdock/topology_nodebase:centos6.6

To SSH into a node and look around:

$ clusterdock ssh node-1.cluster
[root@node-1 ~]# ls -l / | head
total 64
dr-xr-xr-x   1 root root 4096 May 19 20:48 bin
drwxr-xr-x   5 root root  360 Aug  4 05:04 dev
drwxr-xr-x   1 root root 4096 Aug  4 05:04 etc
drwxr-xr-x   2 root root 4096 Sep 23  2011 home
dr-xr-xr-x   7 root root 4096 Mar  4  2015 lib
dr-xr-xr-x   1 root root 4096 May 19 20:48 lib64
drwx------   2 root root 4096 Mar  4  2015 lost+found
drwxr-xr-x   2 root root 4096 Sep 23  2011 media
drwxr-xr-x   2 root root 4096 Sep 23  2011 mnt
[root@node-1 ~]# exit

To see full usage instructions for the start action, use -h/--help:

$ clusterdock start topology_nodebase -h
usage: clusterdock start [-h] [--node-disks map] [--always-pull]
                         [--namespace ns] [--network nw] [-o sys] [-r url]
                         [--nodes node [node ...]]
                         topology

Start a nodebase cluster

positional arguments:
  topology              A clusterdock topology directory

optional arguments:
  -h, --help            show this help message and exit
  --always-pull         Pull latest images, even if they're available locally
                        (default: False)
  --namespace ns        Namespace to use when looking for images (default:
                        clusterdock)
  --network nw          Docker network to use (default: cluster)
  -o sys, --operating-system sys
                        Operating system to use for cluster nodes (default:
                        centos6.6)
  -r url, --registry url
                        Docker Registry from which to pull images (default:
                        None)

nodebase arguments:
  --node-disks map      Map of node names to block devices (default: None)

Node groups:
  --nodes node [node ...]
                        Nodes of the nodes group (default: ['node-1',
                        'node-2'])

When you're done and want to clean up:

$ clusterdock manage nuke
2017-08-03 10:06:28 PM clusterdock.actions.manage INFO     Stopping and removing clusterdock containers ...
2017-08-03 10:06:30 PM clusterdock.actions.manage INFO     Removed user-defined networks ...

To see full usage instructions for the build action, use -h/--help:

$ clusterdock build topology_nodebase -h
usage: clusterdock build [--network nw] [-o sys] [--repository repo] [-h]
                         topology

Build images for the nodebase topology

positional arguments:
  topology              A clusterdock topology directory

optional arguments:
  --network nw          Docker network to use (default: cluster)
  -o sys, --operating-system sys
                        Operating system to use for cluster nodes (default:
                        None)
  --repository repo     Docker repository to use for committing images
                        (default: docker.io/clusterdock)
  -h, --help            show this help message and exit
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].