Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Stars: ✭ 54 (-68.97%)

Mutual labels: spark

Lambda Arch

Applying Lambda Architecture with Spark, Kafka, and Cassandra.

Stars: ✭ 111 (-36.21%)

Mutual labels: spark

confluent-spark-avro

Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.

Stars: ✭ 18 (-89.66%)

Mutual labels: spark

Play Spark Scala

Stars: ✭ 51 (-70.69%)

Mutual labels: spark

blog

blog entries

Stars: ✭ 39 (-77.59%)

Mutual labels: spark

Spark Structured Streaming Examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

Stars: ✭ 168 (-3.45%)

Mutual labels: spark

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (-80.46%)

Mutual labels: spark

Apache Spark Internals

The Internals of Apache Spark

Stars: ✭ 1,045 (+500.57%)

Mutual labels: spark

spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Stars: ✭ 25 (-85.63%)

Mutual labels: spark

Java learning practice

java 进阶之路：面试高频算法、akka、多线程、NIO、Netty、SpringBoot、Spark&&Flink 等

Stars: ✭ 110 (-36.78%)

Mutual labels: spark

Casper

A compiler for automatically re-targeting sequential Java code to Apache Spark.

Stars: ✭ 45 (-74.14%)

Mutual labels: spark

Spark As Service Using Embedded Server

This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server

Stars: ✭ 46 (-73.56%)

Mutual labels: spark

visions

Type System for Data Analysis in Python

Stars: ✭ 136 (-21.84%)

Mutual labels: spark

Quicksql

A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources

Stars: ✭ 1,821 (+946.55%)

Mutual labels: spark

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-36.21%)

Mutual labels: spark

Delta Architecture

Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline

Stars: ✭ 43 (-75.29%)

Mutual labels: spark

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+1313.22%)

Mutual labels: spark

Parquet Index

Spark SQL index for Parquet tables

Stars: ✭ 109 (-37.36%)

Mutual labels: spark

Spark-PMoF

Spark Shuffle Optimization with RDMA+AEP

Stars: ✭ 28 (-83.91%)

Mutual labels: spark

Gatk

Official code repository for GATK versions 4 and up

Stars: ✭ 1,002 (+475.86%)

Mutual labels: spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-92.53%)

Mutual labels: spark

Spark.jl

Julia binding for Apache Spark

Stars: ✭ 153 (-12.07%)

Mutual labels: spark

docker-spark

Apache Spark docker container image (Standalone mode)

Stars: ✭ 34 (-80.46%)

Mutual labels: spark

Pixiedust

Python Helper library for Jupyter Notebooks

Stars: ✭ 998 (+473.56%)

Mutual labels: spark

Python Master Courses

人生苦短我用Python

Stars: ✭ 61 (-64.94%)

Mutual labels: spark

Pyspark Cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

Stars: ✭ 108 (-37.93%)

Mutual labels: spark

spark-sql-flow-plugin

Visualize column-level data lineage in Spark SQL

Stars: ✭ 20 (-88.51%)

Mutual labels: spark

Snappydata

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster

Stars: ✭ 995 (+471.84%)

Mutual labels: spark

spark-kubernetes

spark on kubernetes

Stars: ✭ 80 (-54.02%)

Mutual labels: spark

Apache Spark Node

Node.js bindings for Apache Spark DataFrame APIs

Stars: ✭ 136 (-21.84%)

Mutual labels: spark

Search Ads Web Service

Online search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]

Stars: ✭ 30 (-82.76%)

Mutual labels: spark

Real Time Stream Processing Engine

This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.

Stars: ✭ 37 (-78.74%)

Mutual labels: spark

spark-gradle-template

Apache Spark in your IDE with gradle

Stars: ✭ 39 (-77.59%)

Mutual labels: spark

Flink Learning

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

Stars: ✭ 11,378 (+6439.08%)

Mutual labels: spark

openverse-catalog

Identifies and collects data on cc-licensed content across web crawl data and public apis.

Stars: ✭ 27 (-84.48%)

Mutual labels: spark

Learning Spark

零基础学习spark，大数据学习

Stars: ✭ 37 (-78.74%)

Mutual labels: spark

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (-45.4%)

Mutual labels: spark

Bigdata docker

Big Data Ecosystem Docker

Stars: ✭ 161 (-7.47%)

Mutual labels: spark

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (-85.06%)

Mutual labels: spark

Spark Summit East 2017

Stars: ✭ 33 (-81.03%)

Mutual labels: spark

sparkar-volts

An extensive non-reactive Typescript framework that eases the development experience in Spark AR

Stars: ✭ 15 (-91.38%)

Mutual labels: spark

Logigsk

A Linux based software package to control led's on Logitech G910, G810, G610 and G410.

Stars: ✭ 107 (-38.51%)

Mutual labels: spark

experiments

Code examples for my blog posts

Stars: ✭ 21 (-87.93%)

Mutual labels: spark

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+448.28%)

Mutual labels: spark

splink

Implementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters

Stars: ✭ 181 (+4.02%)

Mutual labels: spark

Aliyun Emapreduce Datasources

Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.

Stars: ✭ 132 (-24.14%)

Mutual labels: spark

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (+445.4%)

Mutual labels: spark

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+1347.13%)

Mutual labels: spark

Transmogrifai

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Stars: ✭ 2,084 (+1097.7%)

Mutual labels: spark

Azure Cosmosdb Spark

Apache Spark Connector for Azure Cosmos DB

Stars: ✭ 165 (-5.17%)

Mutual labels: spark

Handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes

Stars: ✭ 158 (-9.2%)

Mutual labels: spark

Technology Talk

汇总java生态圈常用技术框架、开源中间件，系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识