Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (-52.44%)

Mutual labels: hadoop

Choregraphie

Choregraphie offers primitive to coordinate convergence of chef resources.

Stars: ✭ 24 (-70.73%)

Mutual labels: chef

Mleap

MLeap: Deploy ML Pipelines to Production

Stars: ✭ 1,232 (+1402.44%)

Mutual labels: spark

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+1102.44%)

Mutual labels: spark

Javafamily

【Java面试+Java学习指南】一份涵盖大部分Java程序员所需要掌握的核心知识。

Stars: ✭ 28,668 (+34860.98%)

Mutual labels: zookeeper

hadoop-data-ingestion-tool

OLAP and ETL of Big Data

Stars: ✭ 17 (-79.27%)

Mutual labels: hadoop

Cp Helm Charts

The Confluent Platform Helm charts enable you to deploy Confluent Platform services on Kubernetes for development, test, and proof of concept environments.

Stars: ✭ 539 (+557.32%)

Mutual labels: zookeeper

ansible-cloudera-hadoop

ansible playbook to deploy cloudera hadoop components to the cluster

Stars: ✭ 51 (-37.8%)

Mutual labels: hbase

go-solr

solr go client from sendgrid, zookeeper aware, incorporates retries

Stars: ✭ 39 (-52.44%)

Mutual labels: zookeeper

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-71.95%)

Mutual labels: spark

openverse-catalog

Identifies and collects data on cc-licensed content across web crawl data and public apis.

Stars: ✭ 27 (-67.07%)

Mutual labels: spark

Delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+4659.76%)

Mutual labels: spark

last fm

A simple app to demonstrate a testable, maintainable, and scalable architecture for flutter. flutter_bloc, get_it, hive, and REST API are some of the tech stacks used in this project.

Stars: ✭ 134 (+63.41%)

Mutual labels: hive

Express Microservice Starter

An express-based Node.js API bootstrapping module for building microservices.

Stars: ✭ 53 (-35.37%)

Mutual labels: zookeeper

Spark Gbtlr

Hybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark

Stars: ✭ 81 (-1.22%)

Mutual labels: spark

Easyrpc

EasyRpc is a simple, high-performance, easy-to-use RPC framework based on Netty, ZooKeeper and ProtoStuff.

Stars: ✭ 79 (-3.66%)

Mutual labels: zookeeper

Lpa Detector

Optimize and improve the Label propagation algorithm

Stars: ✭ 75 (-8.54%)

Mutual labels: spark

Pyspark Twitter Stream Mining

Real-time Machine Learning with Apache Spark on Twitter Public Stream

Stars: ✭ 64 (-21.95%)

Mutual labels: spark

Real Time Stream Processing Engine

This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.

Stars: ✭ 37 (-54.88%)

Mutual labels: spark

Justenoughscalaforspark

A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.

Stars: ✭ 538 (+556.1%)

Mutual labels: spark

aix

Resources for AIX hosts

Stars: ✭ 22 (-73.17%)

Mutual labels: chef

codes-scratch-zookeeper-netty

zk + netty 实现集群节点文件同步服务

Stars: ✭ 29 (-64.63%)

Mutual labels: zookeeper

Awesome Ada

A curated list of awesome resources related to the Ada and SPARK programming language

Stars: ✭ 299 (+264.63%)

Mutual labels: spark

hbase-prometheus-monitoring

No description or website provided.

Stars: ✭ 19 (-76.83%)

Mutual labels: hbase

Usersessionbehaviorofflineanalysis

四川大学拓思爱诺用户session行为数据离线分析项目

Stars: ✭ 69 (-15.85%)

Mutual labels: spark

Activemq

Development repository for activemq Chef Cookbook

Stars: ✭ 19 (-76.83%)

Mutual labels: chef

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (-39.02%)

Mutual labels: spark

Lopq

Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.

Stars: ✭ 530 (+546.34%)

Mutual labels: spark

sitecore-packer

Packer templates for Sitecore development with IIS, SOLR and SQL Server on Windows

Stars: ✭ 19 (-76.83%)

Mutual labels: chef

simple-rpc-plus

使用netty和zookeeper技术实现的远程调用框架

Stars: ✭ 16 (-80.49%)

Mutual labels: zookeeper

docker-repo

A repository stores some dockerfiles or docker-compose files for quickly starting service or service cluster.

Stars: ✭ 26 (-68.29%)

Mutual labels: zookeeper

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (+15.85%)

Mutual labels: spark

Chef Vault

chef-vault cookbook

Stars: ✭ 63 (-23.17%)

Mutual labels: chef

kzmonitor

kafka zookeeper monitor

Stars: ✭ 34 (-58.54%)

Mutual labels: zookeeper

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (+525.61%)

Mutual labels: spark

spark-druid-olap

Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.

Stars: ✭ 286 (+248.78%)

Mutual labels: spark

601-660 of 1052 similar projects