flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

Stars: ✭ 11,378 (+20587.27%)

Mutual labels: spark, stream-processing, flink

Data Ingestion Platform

Stars: ✭ 39 (-29.09%)

Mutual labels: spark, flink, batch-processing

fastdata-cluster

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Stars: ✭ 20 (-63.64%)

Mutual labels: spark, flink

spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.0.0

Stars: ✭ 23 (-58.18%)

Mutual labels: spark, apache-spark

Spark Jupyter Aws

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Stars: ✭ 259 (+370.91%)

Mutual labels: spark, apache-spark

Spark As Service Using Embedded Server

This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server

Stars: ✭ 46 (-16.36%)

Mutual labels: spark, apache-spark

flink-connectors

Apache Flink connectors for Pravega.

Stars: ✭ 84 (+52.73%)

Mutual labels: stream-processing, flink

prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Stars: ✭ 54 (-1.82%)

Mutual labels: spark, data-processing

spark-gradle-template

Apache Spark in your IDE with gradle

Stars: ✭ 39 (-29.09%)

Mutual labels: spark, apache-spark

Spark Nkp

Natural Korean Processor for Apache Spark

Stars: ✭ 50 (-9.09%)

Mutual labels: spark, apache-spark

Learningsparkv2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Stars: ✭ 307 (+458.18%)

Mutual labels: spark, apache-spark

Wirbelsturm

Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Stars: ✭ 332 (+503.64%)

Mutual labels: spark, apache-spark

Spark Structured Streaming Book

The Internals of Spark Structured Streaming

Stars: ✭ 371 (+574.55%)

Mutual labels: spark, apache-spark

Sk Dist

Distributed scikit-learn meta-estimators in PySpark

Stars: ✭ 260 (+372.73%)

Mutual labels: data-science, spark

Sparkmeasure

This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.

Stars: ✭ 368 (+569.09%)

Mutual labels: spark, apache-spark

Sparkle

Haskell on Apache Spark.

Stars: ✭ 419 (+661.82%)

Mutual labels: spark, apache-spark

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+10823.64%)

Mutual labels: spark, flink

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+10183.64%)

Mutual labels: data-science, spark

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+1050.91%)

Mutual labels: data-science, spark

Spring Cloud Dataflow

A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes

Stars: ✭ 753 (+1269.09%)

Mutual labels: stream-processing, batch-processing

open-stream-processing-benchmark

This repository contains the code base for the Open Stream Processing Benchmark.

Stars: ✭ 37 (-32.73%)

Mutual labels: stream-processing, flink

SANSA-Stack

Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/

Stars: ✭ 130 (+136.36%)

Mutual labels: apache-spark, flink

proxima-platform

The Proxima platform.

Stars: ✭ 17 (-69.09%)

Mutual labels: apache-spark, batch-processing

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+101.82%)

Mutual labels: spark, apache-spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-76.36%)

Mutual labels: spark, apache-spark

Spark Tda

SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.

Stars: ✭ 45 (-18.18%)

Mutual labels: spark, apache-spark

streamsx.kafka

Repository for integration with Apache Kafka

Stars: ✭ 13 (-76.36%)

Mutual labels: apache-spark, stream-processing

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+1401.82%)

Mutual labels: spark, flink

Cloudflow

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.

Stars: ✭ 278 (+405.45%)

Mutual labels: spark, flink

Coolplayspark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Stars: ✭ 3,318 (+5932.73%)

Mutual labels: spark, apache-spark

Hub

Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai

Stars: ✭ 4,003 (+7178.18%)

Mutual labels: data-science, data-processing

Sparklyr

R interface for Apache Spark

Stars: ✭ 775 (+1309.09%)

Mutual labels: spark, apache-spark

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+1589.09%)

Mutual labels: spark, apache-spark

Tiledb Vcf

Efficient variant-call data storage and retrieval library using the TileDB storage library.

Stars: ✭ 26 (-52.73%)

Mutual labels: data-science, spark

Featran

A Scala feature transformation library for data science and machine learning

Stars: ✭ 420 (+663.64%)

Mutual labels: spark, flink

FlinkExperiments

Experiments with Apache Flink.

Stars: ✭ 3 (-94.55%)

Mutual labels: stream-processing, flink

Dist Keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Stars: ✭ 613 (+1014.55%)

Mutual labels: data-science, apache-spark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+9923.64%)

Mutual labels: spark, flink

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (+1223.64%)

Mutual labels: spark, apache-spark

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (+1385.45%)

Mutual labels: spark, flink

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+1341.82%)

Mutual labels: spark, apache-spark

Spark Examples

Spark examples

Stars: ✭ 41 (-25.45%)

Mutual labels: spark, apache-spark

Bdp Dataplatform

大数据生态解决方案数据平台：基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。

Stars: ✭ 456 (+729.09%)

Mutual labels: spark, flink

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+1458.18%)

Mutual labels: spark, flink

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Stars: ✭ 864 (+1470.91%)

Mutual labels: data-science, data-processing

Live log analyzer spark

Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.

Stars: ✭ 14 (-74.55%)

Mutual labels: spark, apache-spark

Real Time Stream Processing Engine

This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.

Stars: ✭ 37 (-32.73%)

Mutual labels: spark, apache-spark

Hazelcast Jet

Distributed Stream and Batch Processing

Stars: ✭ 855 (+1454.55%)

Mutual labels: stream-processing, batch-processing

Spark Flamegraph

Easy CPU Profiling for Apache Spark applications

Stars: ✭ 30 (-45.45%)

Mutual labels: spark, apache-spark

Apache Spark Internals

The Internals of Apache Spark

Stars: ✭ 1,045 (+1800%)

Mutual labels: spark, apache-spark

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+5434.55%)

Mutual labels: data-science, spark

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+39987.27%)

Mutual labels: data-science, spark

Dataflowjavasdk

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

Stars: ✭ 854 (+1452.73%)

Mutual labels: data-science, data-processing

1-60 of 1595 similar projects

›

next*5