All Projects → sergevs → ansible-cloudera-hadoop

sergevs / ansible-cloudera-hadoop

Licence: MIT License
ansible playbook to deploy cloudera hadoop components to the cluster

Programming Languages

shell
77523 projects
Batchfile
5799 projects
XSLT
1337 projects

Projects that are alternatives of or similar to ansible-cloudera-hadoop

litemall-dw
基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (-29.41%)
Mutual labels:  hbase, oozie
cloud
云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件
Stars: ✭ 48 (-5.88%)
Mutual labels:  hbase, oozie
BigInsights-on-Apache-Hadoop
Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix
Stars: ✭ 21 (-58.82%)
Mutual labels:  hbase, oozie
hbase-meta-repair
Repair hbase metadata table from hdfs.
Stars: ✭ 36 (-29.41%)
Mutual labels:  hbase
hbase-python
hbase-python is a pure python package used to access HBase.
Stars: ✭ 38 (-25.49%)
Mutual labels:  hbase
hbase-packet-inspector
Analyzes network traffic of HBase RegionServers
Stars: ✭ 35 (-31.37%)
Mutual labels:  hbase
disk
基于hadoop+hbase+springboot实现分布式网盘系统
Stars: ✭ 53 (+3.92%)
Mutual labels:  hbase
talos
No description or website provided.
Stars: ✭ 37 (-27.45%)
Mutual labels:  hbase
darwin
Avro Schema Evolution made easy
Stars: ✭ 26 (-49.02%)
Mutual labels:  hbase
implyr
SQL backend to dplyr for Impala
Stars: ✭ 74 (+45.1%)
Mutual labels:  impala
thrift2-hbase
thrift2-hbase component for Hyperf.
Stars: ✭ 14 (-72.55%)
Mutual labels:  hbase
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-62.75%)
Mutual labels:  hbase
mango
Core utility library & data connectors designed for simpler usage in Scala
Stars: ✭ 41 (-19.61%)
Mutual labels:  hbase
aaocp
一个对用户行为日志进行分析的大数据项目
Stars: ✭ 53 (+3.92%)
Mutual labels:  hbase
xxhadoop
Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Stars: ✭ 37 (-27.45%)
Mutual labels:  hbase
cmux
A set of commands for managing CDH clusters using Cloudera Manager REST API.
Stars: ✭ 34 (-33.33%)
Mutual labels:  hbase
springBoot-hbase
封装了一个简易的hbase-orm框架
Stars: ✭ 25 (-50.98%)
Mutual labels:  hbase
swordfish
Open-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-31.37%)
Mutual labels:  hbase
cbass
adding "simple" to HBase
Stars: ✭ 25 (-50.98%)
Mutual labels:  hbase
BigDataTools
tools for bigData
Stars: ✭ 36 (-29.41%)
Mutual labels:  hbase

Ansible Playbook: cloudera-hadoop

An ansible playbook to deploy Cloudera hadoop components to the cluster

Overview

The playbook is composed according to official cloudera guides with a primary purpose of production deployment in mind. High availability for HDFS and Yarn is implemented when a sufficient number of resources(hosts) is configured. From the other side, all of the components can be also deployed on a single host.

Description

The playbook is able to setup the required services for components:

  • hadoop hdfs
  • hadoop yarn mapreduce
  • zookeeper
  • hive
  • hbase
  • impala
  • solr
  • spark
  • oozie
  • kafka
  • hue
  • postgresql

The configuration is very simple:

It’s only required to place hostname(s) to the appropriate group in the hosts file, and the required services will be setup.

The playbook contain all configuration files in roles directories. If you need to add or change any parameter you can edit the required configuration file which can be found in roles/service/[files|templates] directory.

The playbook runs configuration check tasks at start, and will stop if the configuration is not supported, providing a descriptive error message.

Besides of cluster( or single host ) setup, the playbook also generates cluster manager configuration file located at workdir/services.xml. Please visit clinit manager home page and see manual . The rpm package can be downloaded from clinit-1.0-ssv1.el6.noarch.rpm. After clinit package installed you’ll be able to stop, start and see status of services on any node.

Configuration

Service configuration performed using the hosts file. The empty hosts file is supplied with playbook. You must not remove any existing group. Leave the group empty if you don't need services the group configures. The same hostname can be placed to any hosts group. As an instance if you want setup everything on one host, just put the same hostname to each hosts group.

Hosts file groups description:

  • [namenodes] : configures namenode services, at least 1 host is required, 2 hosts are allowed. HA HDFS with automatic namenode failover will be configured in the case of 2 hosts configured.
  • [datanodes] : configures datanode services, at least 1 host is required.
  • [yarnresourcemanager] : configures mapreduce yarn resource manager, at least 1 host is required. HA with automatic resource manager failover will be configured in the case more than 1 host provided. job history server will be configured on the 1st host in the group. node manager services will be configured on [datanodes] hosts.
  • [zookeepernodes] : confiugures zookeeper services. 3 or 5 hosts is required for HA in the case 2 [namenodes] hosts configured.
  • [journalnodes] : configures journalnode services required for HA configuration, at least one host is required in the case 2 [namenodes] hosts configured.
  • [postgresql] : configures postgresql server. the server provides a database storage required for other services( see below ). 1 host is allowed.
  • [hivemetastore] : configures hive metastore and hiveserver2 services. 1 host is allowed. [postgresql] host is required for metadata storage.
  • [impala-store-catalog]: configures impala-catalog and impala-state-store services. 1 host is allowed. impala-server will be configured on each [datanodes] host. [hivemetastore] host is required for metadata storage.
  • [hbasemaster]: configures hbase-master services. 1 host is allowed. hbase-regionserver will be configured on on each [datanodes] host. at least 1 [zookeepernodes] host is required.
  • [solr]: configures solr service. at least 1 [zookeepernodes] host is required.
  • [spark]: configures hosts to submit spark jobs. spark history server will be configured on the first host in the group.
  • [oozie]: configures oozie service. [postgresql] host is required for data storage.
  • [kafka]: configures kafka-server service. at least 1 [zookeepernodes] host is required.
  • [hue]: configures hue services. [oozie] host is required to submit jobs. [postgresql] is required for data storage.
  • [dashboard]: places a simple static dashboard with links to all other services on mentioned hosts. See dashboard below.

Variables parameters:

Please see group_vars

Usage

To start deployment run:

ansible-playbook -i hosts site.yaml

if you have installed clinit you can also run:

clinit -S workdir/services.xml status
clinit -S workdir/services.xml tree

To deploy configuration on existing cluster:

ansible-playbook -i hosts --skip-tags=init,postgresql site.yaml

Tags used in playbook:

  • package : install rpm packages
  • init : clean up and initialize data
  • config : deploy configuration files, useful if you want just change configuration on hosts.
  • test : run test actions
  • check : check hosts configuration

Also most hostgroups have the tag with similar name.

Monitoring

Playbook optionaly provides syslog-ng configuration and snmp-subagent configuration.

To use syslog-ng:

  • set variable enable_syslog to true;
  • set variable syslog_ng_destination to existing syslog-ng destination(default value is d_logcollector_throttled).

To use snmp set enable_snmp to true and put following packages to repository:

Dashboard

dashboard

The dashboard consists of 5 static files and is placed into /var/html/dashboard by default. Most of services should be available in a frame on the right. But Solr and some others set X-Frame-Options header which denies embedding them into an iframe. If a service opens a white page for you - try opening it in a new tab with middle mouse button.

Requirements

Ansible >= 2.2.0.0 is required. Please read official documentation to install it.

OS version: Redhat/CentOS 6, 7

Cloudera Hadoop version: 5.4 - 5.9

The required for Cloudera Hadoop repositories have to be properly configured on the target hosts. See also official documentation

Java package(s) have to be available in the repository. You can download jdk-8u65-linux-x64.rpm from official oracle site

SSH key passwordless authentication must be configured for root account for all target hosts.

se_linux must be disabled

remote_user = root must be configured for ansible.

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].