Alternatives and detailed information of bigkube

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+14418.75%)

Mutual labels: spark, presto

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+7368.75%)

Mutual labels: airflow, spark

Cube.js

📊 Cube — Open-Source Analytics API for Building Data Apps

Stars: ✭ 11,983 (+74793.75%)

Mutual labels: spark, presto

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+837.5%)

Mutual labels: spark, hdfs

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+4856.25%)

Mutual labels: airflow, spark

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (+262.5%)

Mutual labels: spark, hdfs

Airflow Pipeline

An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR

Stars: ✭ 128 (+700%)

Mutual labels: airflow, spark

fastdata-cluster

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Stars: ✭ 20 (+25%)

Mutual labels: spark, hdfs

openverse-catalog

Identifies and collects data on cc-licensed content across web crawl data and public apis.

Stars: ✭ 27 (+68.75%)

Mutual labels: airflow, spark

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+68593.75%)

Mutual labels: spark, hdfs

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (+475%)

Mutual labels: spark, hdfs

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (+343.75%)

Mutual labels: spark, hdfs

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-12.5%)

Mutual labels: spark, hdfs

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (+2481.25%)

Mutual labels: airflow, spark

Pucket

Bucketing and partitioning system for Parquet

Stars: ✭ 29 (+81.25%)

Mutual labels: spark, hdfs

Learning Spark

零基础学习spark，大数据学习

Stars: ✭ 37 (+131.25%)

Mutual labels: spark, hdfs

Udacity Data Engineering

Udacity Data Engineering Nano Degree (DEND)

Stars: ✭ 89 (+456.25%)

Mutual labels: airflow, spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-18.75%)

Mutual labels: spark, hdfs

View All Similar Projects ➔

Bigkube - effortless Spark applications deployment and testing in minikube.

Bigkube is about big data and Spark local development automation - automated deployments, well fitted components, integration testing with SBT right from IDE console.

Prerequisites

Install Docker
Install Minikube
Get Helm
Get Scala and SBT
Make sure SBT version is not less than 1.2.8 and there's Scala 2.11 sdk is set for the project

Before deployment

Make sure minikube has --cpus=4 --memory=8192 flags set. Make sure the appropriate vm-driver is set (use minikube config set).
Run sbt assembly repo's base dir.

Deployment steps

cd deployment - DON'T SKIP THIS
./bigkube.sh --serve-jar - read output instructions carefully, ensure jar serving host ip is substituted according instructions
./bigkube.sh --create - creates all necessary infrastructure. Troubleshooting: don't worry if some pods are "red" right after deployment. All Hadoop deployments are smart enough to wait for each other to be stand by in appropriate sequence. Just give them some time.
./bigkube.sh --spark-init - inits Helm (with Tiller) and Spark operator.

Note: bigkube.sh resolves all service accounts issues, secrets, config maps and etc.

Run integration tests

Write your own integration test using SparkController. Examples provided.
Simply run sbt it:test from repo's base dir - and that's it, your Spark app is deployed into minikube and tests are executed locally on your host machine.

GUI

minikube dashboard
Run minikube services list. You can go to namenode and presto UI with corresponding URLs.
One can use Metabase, an open source tool for rapid access data. Works with Presto and SQLServer as well.

Delete deployments

Alongside with kubectl delete -f file.yaml you can use:

./bigkube.sh --delete - deletes all bigkube infrastructure
./bigkube.sh --spark-drop - deletes helmed spark operator

Airflow inside of the bigkube with custom Spark Operator

cd deployment
You need just run ./bigkube.sh --airflow-init and all necessary airflow staff should be up and running
minikube service airflow will open your browser with Airflow UI

For more details, please visit this repository Airflow

Acknowledgments

Thanks to Nick Grigoriev for the idea and help. Thanks to Big Data Europe for Hadoop Docker images. Thanks to Valeira Katalnikova for Bigkube logo.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

nikkatalnikov / bigkube

Programming Languages

Labels

Projects that are alternatives of or similar to bigkube

Bigkube - effortless Spark applications deployment and testing in minikube.

Prerequisites

Before deployment

Deployment steps

Run integration tests

GUI

Delete deployments

Airflow inside of the bigkube with custom Spark Operator

Acknowledgments