All Projects → earthquakesan → Hdfs Spark Hive Dev Setup

earthquakesan / Hdfs Spark Hive Dev Setup

This repository contains makescript and instruction on how to setup local hdfs+spark+hive setup.

Labels

Projects that are alternatives of or similar to Hdfs Spark Hive Dev Setup

Thunderbird Flatpak
Resources to build Mozilla Thunderbird as a flatpak
Stars: ✭ 7 (-58.82%)
Mutual labels:  makefile
Python Mk
A Makefile that contains the seed of a python development environment.
Stars: ✭ 16 (-5.88%)
Mutual labels:  makefile
Mesos Dns Pkg
Packaging utilities for Mesos-DNS
Stars: ✭ 16 (-5.88%)
Mutual labels:  makefile
Armhf Registry
Minimal Docker Registry image the ARM architecture
Stars: ✭ 7 (-58.82%)
Mutual labels:  makefile
Hdpslam
The implementation of HDP SLAM
Stars: ✭ 16 (-5.88%)
Mutual labels:  makefile
Redash Kubernetes
Kubernetes setup for Redash
Stars: ✭ 16 (-5.88%)
Mutual labels:  makefile
Article Template
This is a simple template for writing academic papers. It uses bibtex for tracking bibliographic information and Pandoc to convert the content to a correctly formatted document.
Stars: ✭ 6 (-64.71%)
Mutual labels:  makefile
Cxcore
A prebuilt Linux system use UEFI and f2fs for RaspberryPi 3B, RaspberryPi 3B+, RaspberryPi 4B
Stars: ✭ 17 (+0%)
Mutual labels:  makefile
Device xiaomi cepheus
Stars: ✭ 16 (-5.88%)
Mutual labels:  makefile
Pymake
A Makefile generator in Python
Stars: ✭ 16 (-5.88%)
Mutual labels:  makefile
Docker Coreos Img
🐳 CoreOS image in a docker image
Stars: ✭ 7 (-58.82%)
Mutual labels:  makefile
Disco
decentralized infrastructure for serverless computing operations
Stars: ✭ 16 (-5.88%)
Mutual labels:  makefile
Crazyarcade
A coco2d-x game
Stars: ✭ 16 (-5.88%)
Mutual labels:  makefile
Rtspcamera
android rtsp camera
Stars: ✭ 7 (-58.82%)
Mutual labels:  makefile
Ports
Developer FreeBSD Haskell "overlay" for the mighty FreeBSD Ports Collection. Use with caution, slippery when wet, etc.
Stars: ✭ 16 (-5.88%)
Mutual labels:  makefile
Ovirt Live
This is a mirror for http://gerrit.ovirt.org, for issues use http://bugzilla.redhat.com
Stars: ✭ 6 (-64.71%)
Mutual labels:  makefile
League Gothic
A revival of an old classic, Alternate Gothic #1
Stars: ✭ 887 (+5117.65%)
Mutual labels:  makefile
Articles
Article Publish in Wechat & Toutiao
Stars: ✭ 896 (+5170.59%)
Mutual labels:  makefile
Compiletools
Build C++ fast, with practically no configuration
Stars: ✭ 16 (-5.88%)
Mutual labels:  makefile
Docker Release Toolkit
My personal toolkit for building releases with Docker
Stars: ✭ 16 (-5.88%)
Mutual labels:  makefile

HDFS/Spark/Hive Local Development Setup

This repository provides the installation instructions for

  • Hadoop 2.7.2,
  • Spark 2.0.0 and
  • Hive 2.1.0 for development on a local machine. SANSA stack developers use this environment setup for development and debugging. As we run our production code in docker containers, docker-driven CI is a part of our delivery cycle as well.

Our developers use Ubuntu LTS and organize their work inside dedicated ~/Workspace directory. If you do not know where to install your HDFS/Spark/Hive setup, then put it into ~/Workspace/hadoop-spark-hive directory. After the installation the directory will be contains the following:

├── data
├── Makefile
├── src
└── tools
    ├── apache-hive-2.1.0-bin
    ├── hadoop-2.7.2
    └── spark-1.6.2-bin-without-hadoop
  • Makefile. Used for running various tasks such as starting up the hadoop/spark/hive, running interactive shells for spark/hive etc.
  • src/ directory. Contains git repositories with various spark applications.
  • tools/ directory. Contains hadoop/spark/hive binaries.
  • data/ directory contains HDFS data and spark-rdd data.

Usage

Clone this repository into the folder where you want to create your HDFS/Spark/Hive setup:

mkdir -p ~/Workspace/hadoop-spark-hive && cd ~/Workspace/hadoop-spark-hive
git clone https://github.com/earthquakesan/hdfs-spark-hive-dev-setup ./

Download HDFS/Spark/Hive binaries

make download

After this step you should have tools/ folder with the following structure:

└── tools
    ├── apache-hive-2.1.0-bin
    ├── hadoop-2.7.2
    └── spark-2.0.0-bin

Configure HDFS/Spark

make configure

Start HDFS

Start hadoop DFS (distributed file system), basically 1 namenode and 1 datanode:

make start_hadoop

Open your browser and go to localhost:50070. If you can open the page and see 1 datanode registered on your namenode, then hadoop setup is finished.

Start Spark

Start local Spark cluster:

make start_spark

Open your browser and go to localhost:8080. If you can open the page and see 1 spark-worker registered with spark-master, then spark setup is finished.

Configure Hive

Hadoop should be running for Hive configuration:

make configure_hive

Start Hive Metastore

make start_hive_postgres_metastore

This command will first start Postgresql docker container on your local docker host and then start the metastore (will occupy the terminal session). In case, you need to install docker, please refer to official installation guide. In a case if docker container did not start up completely when you start metastore you will get an error. Then you will need to start metastore manually:

make activate
source activate
hive --service metastore

Start Hive

Run the Hive server (it will occupy the terminal session, providing server logs to it):

make start_hive_server

Start beeline client to connect to the Hive server (you might not be able to connect if you are too fast, the Hive server takes time to start up):

make start_hive_beeline_client

Execute some queries to see if the Hive server works properly:

CREATE TABLE pokes (foo INT, bar STRING);
LOAD DATA LOCAL INPATH './tools/apache-hive-2.1.0-bin/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
DESCRIBE pokes;

Misc

Adding sample data to Hive

Assuming that you have hadoop/spark/hive_server running, start the beeline client:

make start_hive_beeline_client

Then load the sample data as follows:

CREATE TABLE pokes (foo INT, bar STRING);
LOAD DATA LOCAL INPATH './tools/apache-hive-2.1.0-bin/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;

Stopping HDFS/Spark/Hive

To stop HDFS:

make stop_hadoop

To stop Spark:

make stop_spark

To stop Hive you need to open terminal session, CTRL+Z and then kill the process by its pid:

kill -9 pid

How to connect to HIVE with JDBC

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].