Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → cdapio → Hadoop_cookbook

cdapio / Hadoop_cookbook

Licence: apache-2.0

Cookbook to install Hadoop 2.0+ using Chef

Programming Languages

36898 projects - #4 most used programming language

Labels

spark hadoop zookeeper chef chef-cookbook hive hbase

Projects that are alternatives of or similar to Hadoop cookbook

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (+896.34%)

Mutual labels: zookeeper, spark, hadoop, hive, hbase

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+907.32%)

Mutual labels: zookeeper, spark, hadoop, hive, hbase

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+7226.83%)

Mutual labels: zookeeper, spark, hadoop, hive, hbase

Big Data Ecosystem Docker

Stars: ✭ 161 (+96.34%)

Mutual labels: zookeeper, spark, hadoop, hive, hbase

大数据入门指南 ⭐

Stars: ✭ 10,991 (+13303.66%)

Mutual labels: zookeeper, spark, hadoop, hive, hbase

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (+12.2%)

Mutual labels: zookeeper, spark, hadoop, hive, hbase

基于Spark2.2新闻网大数据实时系统项目

Stars: ✭ 36 (-56.1%)

Mutual labels: spark, hive, hadoop, hbase

WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

Stars: ✭ 372 (+353.66%)

Mutual labels: spark, hadoop, hive, hbase

💎🔥大数据学习笔记

Stars: ✭ 488 (+495.12%)

Mutual labels: zookeeper, hadoop, hive, hbase

Open-source distribute workflow schedule tools, also support streaming task.

Stars: ✭ 35 (-57.32%)

Mutual labels: spark, hive, hadoop, hbase

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (-64.63%)

Mutual labels: hive, hadoop, hbase, zookeeper

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Stars: ✭ 37 (-54.88%)

Mutual labels: hive, hadoop, hbase, zookeeper

Code Library for My Blog

Stars: ✭ 39 (-52.44%)

Mutual labels: spark, hadoop, hbase, zookeeper

云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件

Stars: ✭ 48 (-41.46%)

Mutual labels: hive, hadoop, hbase, zookeeper

Haproxy Configs

80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.

Stars: ✭ 106 (+29.27%)

Mutual labels: zookeeper, hadoop, hive, hbase

一个对用户行为日志进行分析的大数据项目

Stars: ✭ 53 (-35.37%)

Mutual labels: hive, hadoop, hbase, zookeeper

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (+932.93%)

Mutual labels: zookeeper, spark, hadoop, hbase

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-82.93%)

Mutual labels: spark, hadoop, hbase

Zookeeper Cookbook

Chef cookbook for installing and managing Zookeeper.

Stars: ✭ 80 (-2.44%)

Mutual labels: zookeeper, chef, chef-cookbook

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+395.12%)

Mutual labels: spark, hadoop, hbase

View All Similar Projects ➔

hadoop cookbook

Requirements

This cookbook may work on earlier versions, but these are the minimal tested versions.

Chef 11.4.0+
CentOS 6.4+
Debian 6.0+
Ubuntu 12.04+

This cookbook assumes that you have a working Java installation. It has been tested using version 1.21.2 of the java cookbook, using Oracle JDK 7. If you plan on using Hive with a database other than the embedded Derby, you will need to provide it and set it up prior to starting Hive Metastore service.

Usage

This cookbook is designed to be used with a wrapper cookbook or a role with settings for configuring Hadoop. The services should work out of the box on a single host, but little validation is done that you have made a working Hadoop configuration. The cookbook is attribute-driven and is suitable for use via either chef-client or chef-solo since it does not use any server-based functionality. The cookbook defines service definitions for each Hadoop service, but it does not enable or start them, by default.

For more information, read the Wrapping this cookbook wiki entry.

Attributes

Attributes for this cookbook define the configuration files for Hadoop and its various services. Hadoop configuration files are XML files, with name/value property pairs. The attribute name determines which file the property is placed and the property name. The attribute value is the property value. The attribute hadoop['core_site']['fs.defaultFS'] will configure a property named fs.defaultFS in core-site.xml in hadoop['conf_dir']. All attribute values are taken as-is and only minimal configuration checking is done on values. It is up to the user to provide a valid configuration for your cluster.

Attribute Tree	File	Location
flume['flume_conf']	flume.conf	`flume['conf_dir']`
hadoop['capacity_scheduler']	capacity-scheduler.xml	`hadoop['conf_dir']`
hadoop['container_executor']	container-executor.cfg	`hadoop['conf_dir']`
hadoop['core_site']	core-site.xml	`hadoop['conf_dir']`
hadoop['fair_scheduler']	fair-scheduler.xml	`hadoop['conf_dir']`
hadoop['hadoop_env']	hadoop-env.sh	`hadoop['conf_dir']`
hadoop['hadoop_metrics']	hadoop-metrics.properties	`hadoop['conf_dir']`
hadoop['hadoop_policy']	hadoop-policy.xml	`hadoop['conf_dir']`
hadoop['hdfs_site']	hdfs-site.xml	`hadoop['conf_dir']`
hadoop['log4j']	log4j.properties	`hadoop['conf_dir']`
hadoop['mapred_env']	mapred-env.sh	`hadoop['conf_dir']`
hadoop['mapred_site']	mapred-site.xml	`hadoop['conf_dir']`
hadoop['yarn_env']	yarn-env.sh	`hadoop['conf_dir']`
hadoop['yarn_site']	yarn-site.xml	`hadoop['conf_dir']`
hbase['hadoop_metrics']	hadoop-metrics.properties	`hbase['conf_dir']`
hbase['hbase_env']	hbase-env.sh	`hbase['conf_dir']`
hbase['hbase_policy']	hbase-policy.xml	`hbase['conf_dir']`
hbase['hbase_site']	hbase-site.xml	`hbase['conf_dir']`
hbase['jaas']	jaas.conf	`hbase['conf_dir']`
hbase['log4j']	log4j.properties	`hbase['conf_dir']`
hive['hive_env']	hive-env.sh	`hive['conf_dir']`
hive['hive_site']	hive-site.xml	`hive['conf_dir']`
hive['jaas']	jaas.conf	`hive['conf_dir']`
hive2['hive_env']	hive-env.sh	`hive2['conf_dir']`
hive2['hive_site']	hive-site.xml	`hive2['conf_dir']`
hive2['jaas']	jaas.conf	`hive2['conf_dir']`
oozie['oozie_env']	oozie-env.sh	`oozie['conf_dir']`
oozie['oozie_site']	oozie-site.xml	`oozie['conf_dir']`
spark['log4j']	log4j.properties	`spark['conf_dir']`
spark['metrics']	metrics.properties	`spark['conf_dir']`
spark['spark_env']	spark-env.sh	`spark['conf_dir']`
storm['storm_env']	storm-env.sh	`storm['conf_dir']`
storm['storm_env']	storm_env.ini	`storm['conf_dir']`
storm['storm_conf']	storm.yaml	`storm['conf_dir']`
tez['tez_env']	tez-env.sh	`tez['conf_dir']`
tez['tez_site']	tez-site.xml	`tez['conf_dir']`
zookeeper['jaas']	jaas.conf	`zookeeper['conf_dir']`
zookeeper['log4j']	log4j.properties	`zookeeper['conf_dir']`
zookeeper['zoocfg']	zoo.cfg	`zookeeper['conf_dir']`

Distribution Attributes

hadoop['distribution'] - Specifies which Hadoop distribution to use, currently supported: cdh, hdp, bigtop. Default hdp
hadoop['distribution_version'] - Specifies which version of hadoop['distribution'] to use. Default 2.0 if hadoop['distribution'] is hdp, 5 if hadoop['distribution'] is cdh, and 0.8.0 if hadoop['distribution'] is bigtop. It can also be set to develop when hadoop['distribution'] is bigtop to allow installing from development repos without gpg validation.

APT-specific settings

hadoop['apt_repo_url'] - Provide an alternate apt installation source location. If you change this attribute, you are expected to provide a path to a working repo for the hadoop['distribution'] used. Default: nil
hadoop['apt_repo_key_url'] - Provide an alternative apt repository key source location. Default nil

RPM-specific settings

hadoop['yum_repo_url'] - Provide an alternate yum installation source location. If you change this attribute, you are expected to provide a path to a working repo for the hadoop['distribution'] used. Default: nil
hadoop['yum_repo_key_url'] - Provide an alternative yum repository key source location. Default nil

Global Configuration Attributes

hadoop['conf_dir'] - The directory used inside /etc/hadoop and used via the alternatives system. Default conf.chef
hbase['conf_dir'] - The directory used inside /etc/hbase and used via the alternatives system. Default conf.chef
hive['conf_dir'] - The directory used inside /etc/hive and used via the alternatives system. Default conf.chef
oozie['conf_dir'] - The directory used inside /etc/oozie and used via the alternatives system. Default conf.chef
tez['conf_dir'] - The directory used inside /etc/tez and used via the alternatives system. Default conf.chef
spark['conf_dir'] - The directory used inside /etc/spark and used via the alternatives system. Default conf.chef
storm['conf_dir'] - The directory used inside /etc/storm and used via the alternatives system. Default conf.chef
zookeeper['conf_dir'] - The directory used inside /etc/zookeeper and used via the alternatives system. Default conf.chef

Default Attributes

hadoop['core_site']['fs.defaultFS'] - Sets URI to HDFS NameNode. Default hdfs://localhost
hadoop['yarn_site']['yarn.resourcemanager.hostname'] - Sets hostname of YARN ResourceManager. Default localhost
hive['hive_site']['javax.jdo.option.ConnectionURL'] - Sets JDBC URL. Default jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true
hive['hive_site']['javax.jdo.option.ConnectionDriverName'] - Sets JDBC Driver. Default org.apache.derby.jdbc.EmbeddedDriver

Recipes

default.rb - Sets up configuration and hadoop-client packages.
hadoop_hdfs_checkconfig - Ensures the HDFS configuration meets required parameters.
hadoop_hdfs_datanode - Sets up an HDFS DataNode.
hadoop_hdfs_ha_checkconfig - Ensures the HDFS configuration meets requirements for High Availability.
hadoop_hdfs_journalnode - Sets up an HDFS JournalNode.
hadoop_hdfs_namenode - Sets up an HDFS NameNode.
hadoop_hdfs_secondarynamenode - Sets up an HDFS Secondary NameNode.
hadoop_hdfs_zkfc - Sets up HDFS Failover Controller, required for automated NameNode failover.
hadoop_yarn_nodemanager - Sets up a YARN NodeManager.
hadoop_yarn_proxyserver - Sets up a YARN Web Proxy.
hadoop_yarn_resourcemanager - Sets up a YARN ResourceManager.
hbase - Sets up configuration and hbase packages.
hbase_checkconfig - Ensures the HBase configuration meets required parameters.
hbase_master - Sets up an HBase Master.
hbase_regionserver - Sets up an HBase RegionServer.
hbase_rest - Sets up an HBase REST interface.
hbase_thrift - Sets up an HBase Thrift interface.
hive - Sets up configuration and hive packages.
hive_metastore - Sets up Hive Metastore metadata repository.
hive_server - Sets up a Hive Thrift service.
hive_server2 - Sets up a Hive Thrift service with Kerberos and multi-client concurrency support.
oozie - Sets up an Oozie server.
oozie_client - Sets up an Oozie client.
pig - Installs pig interpreter.
repo - Sets up package manager repositories for specified hadoop['distribution']
spark - Sets up configuration and spark-core packages.
spark_master - Sets up a Spark Master.
spark_worker - Sets up a Spark Worker.
storm - Sets up storm package.
storm_nimbus - Setups a Storm Nimbus server.
storm_supervisor - Setups a Storm Supervisor server.
storm_ui - Setups a Storm UI server.
tez - Sets up configuration and tez packages.
zookeeper - Sets up zookeeper package.
zookeeper_server - Sets up a ZooKeeper server.

Author

Author:: Cask Data, Inc. ([email protected])

Testing

This cookbook has several ways to test it. It includes code tests, which are done using foodcritic, rubocop, and chefspec. It, also, includes functionality testing, provided by kitchen.

rake chefspec     # Run RSpec code examples
rake foodcritic   # Foodcritic linter
rake integration  # Run Test Kitchen integration tests
rake metadata     # Create metadata.json from metadata.rb
rake rubocop      # Ruby style guide linter
rake share        # Share cookbook to community site

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 82

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗