All Projects → nikkatalnikov → bigkube

nikkatalnikov / bigkube

Licence: other
Minikube for big data with Scala and Spark

Programming Languages

python
139335 projects - #7 most used programming language
scala
5932 projects
shell
77523 projects
powershell
5483 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to bigkube

Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (+906.25%)
Mutual labels:  spark, presto, hdfs
Around Dataengineering
A Data Engineering & Machine Learning Knowledge Hub
Stars: ✭ 257 (+1506.25%)
Mutual labels:  airflow, spark
Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+14418.75%)
Mutual labels:  spark, presto
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+7368.75%)
Mutual labels:  airflow, spark
Cube.js
📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+74793.75%)
Mutual labels:  spark, presto
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+837.5%)
Mutual labels:  spark, hdfs
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+4856.25%)
Mutual labels:  airflow, spark
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+262.5%)
Mutual labels:  spark, hdfs
Airflow Pipeline
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (+700%)
Mutual labels:  airflow, spark
fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (+25%)
Mutual labels:  spark, hdfs
openverse-catalog
Identifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (+68.75%)
Mutual labels:  airflow, spark
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+68593.75%)
Mutual labels:  spark, hdfs
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+475%)
Mutual labels:  spark, hdfs
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (+343.75%)
Mutual labels:  spark, hdfs
bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-12.5%)
Mutual labels:  spark, hdfs
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+2481.25%)
Mutual labels:  airflow, spark
Pucket
Bucketing and partitioning system for Parquet
Stars: ✭ 29 (+81.25%)
Mutual labels:  spark, hdfs
Learning Spark
零基础学习spark,大数据学习
Stars: ✭ 37 (+131.25%)
Mutual labels:  spark, hdfs
Udacity Data Engineering
Udacity Data Engineering Nano Degree (DEND)
Stars: ✭ 89 (+456.25%)
Mutual labels:  airflow, spark
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-18.75%)
Mutual labels:  spark, hdfs

Bigkube - effortless Spark applications deployment and testing in minikube.


Bigkube is about big data and Spark local development automation - automated deployments, well fitted components, integration testing with SBT right from IDE console.


Prerequisites

  1. Install Docker
  2. Install Minikube
  3. Get Helm
  4. Get Scala and SBT
  5. Make sure SBT version is not less than 1.2.8 and there's Scala 2.11 sdk is set for the project

Before deployment

  1. Make sure minikube has --cpus=4 --memory=8192 flags set. Make sure the appropriate vm-driver is set (use minikube config set).
  2. Run sbt assembly repo's base dir.

Deployment steps

  1. cd deployment - DON'T SKIP THIS
  2. ./bigkube.sh --serve-jar - read output instructions carefully, ensure jar serving host ip is substituted according instructions
  3. ./bigkube.sh --create - creates all necessary infrastructure. Troubleshooting: don't worry if some pods are "red" right after deployment. All Hadoop deployments are smart enough to wait for each other to be stand by in appropriate sequence. Just give them some time.
  4. ./bigkube.sh --spark-init - inits Helm (with Tiller) and Spark operator.

Note: bigkube.sh resolves all service accounts issues, secrets, config maps and etc.

Run integration tests

  1. Write your own integration test using SparkController. Examples provided.
  2. Simply run sbt it:test from repo's base dir - and that's it, your Spark app is deployed into minikube and tests are executed locally on your host machine.

GUI

  1. minikube dashboard
  2. Run minikube services list. You can go to namenode and presto UI with corresponding URLs.
  3. One can use Metabase, an open source tool for rapid access data. Works with Presto and SQLServer as well.

Delete deployments

Alongside with kubectl delete -f file.yaml you can use:

  1. ./bigkube.sh --delete - deletes all bigkube infrastructure
  2. ./bigkube.sh --spark-drop - deletes helmed spark operator

Airflow inside of the bigkube with custom Spark Operator

  1. cd deployment
  2. You need just run ./bigkube.sh --airflow-init and all necessary airflow staff should be up and running
  3. minikube service airflow will open your browser with Airflow UI

For more details, please visit this repository Airflow


Acknowledgments

Thanks to Nick Grigoriev for the idea and help. Thanks to Big Data Europe for Hadoop Docker images. Thanks to Valeira Katalnikova for Bigkube logo.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].