Real-time ETL developed by Flink, data from MySQL to Greenplum. Use canal to parse the MySQL binlog, put it into kafka, use Flink to consume kafka and assemble the data into Greenplum, and more data sources and target sources will be added in the future.

Stars: ✭ 65 (-76.62%)

Mutual labels: flink

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (-87.77%)

Mutual labels: spark

akka-quickstart-scala.g8

A minimal seed template for an Akka with Scala build

Stars: ✭ 52 (-81.29%)

Mutual labels: akka

nvimhost-scala

♦️ nvim host plugin provider and API client library in Scala

Stars: ✭ 19 (-93.17%)

Mutual labels: akka

np-flink

flink详细学习实践

Stars: ✭ 26 (-90.65%)

Mutual labels: flink

spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Stars: ✭ 25 (-91.01%)

Mutual labels: spark

yuzhouwan

Code Library for My Blog

Stars: ✭ 39 (-85.97%)

Mutual labels: spark

Alchemy

给flink开发的web系统。支持页面上定义udf，进行sql和jar任务的提交；支持source、sink、job的管理；可以管理openshift上的flink集群

Stars: ✭ 264 (-5.04%)

Mutual labels: flink

spark-util

low-level helpers for Apache Spark libraries and tests

Stars: ✭ 16 (-94.24%)

Mutual labels: spark

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-94.96%)

Mutual labels: spark

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (-82.01%)

Mutual labels: spark

flink-tutorials

Flink Tutorial Project

Stars: ✭ 104 (-62.59%)

Mutual labels: flink

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (-65.83%)

Mutual labels: spark

typebus

Framework for building distributed microserviceies in scala with akka-streams and kafka

Stars: ✭ 14 (-94.96%)

Mutual labels: akka

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (-90.65%)

Mutual labels: spark

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (-7.55%)

Mutual labels: spark

swordfish

Open-source distribute workflow schedule tools, also support streaming task.

Stars: ✭ 35 (-87.41%)

Mutual labels: spark

smolder

HL7 Apache Spark Datasource

Stars: ✭ 33 (-88.13%)

Mutual labels: spark

Spark-Ar

Resources for Spark AR

Stars: ✭ 43 (-84.53%)

Mutual labels: spark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-91.01%)

Mutual labels: spark

akka-cluster-consul

Example of bootstrapping Akka Cluster with Consul

Stars: ✭ 26 (-90.65%)

Mutual labels: akka

splink

Implementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters

Stars: ✭ 181 (-34.89%)

Mutual labels: spark

spark-stringmetric

Spark functions to run popular phonetic and string matching algorithms

Stars: ✭ 51 (-81.65%)

Mutual labels: spark

Datavec

ETL Library for Machine Learning - data pipelines, data munging and wrangling

Stars: ✭ 272 (-2.16%)

Mutual labels: spark

Helk

The Hunting ELK

Stars: ✭ 3,097 (+1014.03%)

Mutual labels: spark

Reactive

Reactive: Examples of the most famous reactive libraries that you can find in the market.

Stars: ✭ 256 (-7.91%)

Mutual labels: akka

daf-kylo

Kylo integration with PDND (previously DAF).

Stars: ✭ 20 (-92.81%)

Mutual labels: spark

spark-demos

Collection of different demo applications using Apache Spark

Stars: ✭ 15 (-94.6%)

Mutual labels: spark

visualize-data-with-python

A Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.

Stars: ✭ 60 (-78.42%)

Mutual labels: spark

litemall-dw

基于开源Litemall电商项目的大数据项目，包含前端埋点(openresty+lua)、后端埋点；数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化)，同时也包含了Azkaban的workflow。

Stars: ✭ 36 (-87.05%)

Mutual labels: flink

TheAkkaWay

Akka Chinese Book / What should be included in it?

Stars: ✭ 19 (-93.17%)

Mutual labels: akka

LarkMidTable

LarkMidTable 是一站式开源的数据中台，实现中台的基础建设，数据治理，数据开发，监控告警，数据服务，数据的可视化，实现高效赋能数据前台并提供数据服务的产品。

Stars: ✭ 873 (+214.03%)

Mutual labels: flink

atomic-store

Atomic event store for Scala/Akka

Stars: ✭ 17 (-93.88%)

Mutual labels: akka

dllib

dllib is a distributed deep learning library running on Apache Spark

Stars: ✭ 32 (-88.49%)

Mutual labels: spark

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-60.07%)

Mutual labels: spark

icicle

Icicle Streaming Query Language

Stars: ✭ 16 (-94.24%)

Mutual labels: streaming-data

powerapi-scala

PowerAPI is a middleware toolkit for building software-defined power meters

Stars: ✭ 70 (-74.82%)

Mutual labels: akka

Akka-Streams-custom-stream-processing-examples

Demos of how to do custom stream processing using the Akka Streams GraphStages API

Stars: ✭ 13 (-95.32%)

Mutual labels: akka

AusweisBot

Telegram bot to generate self-authorizations for moving around during covid-19 pandemic in France

Stars: ✭ 13 (-95.32%)

Mutual labels: akka

2018-flink-forward-china

Flink Forward China 2018 第一届记录，视频记录 | 文档记录 | 不仅仅是流计算 | More than streaming

Stars: ✭ 25 (-91.01%)

Mutual labels: flink

Big Data Rosetta Code

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

Stars: ✭ 254 (-8.63%)

Mutual labels: spark

Spotify-Song-Recommendation-ML

UC Berkeley team's submission for RecSys Challenge 2018

Stars: ✭ 70 (-74.82%)

Mutual labels: spark

spark-streaming-visualize

Simple demonstration of how to build a complex real time machine learning visualization tool.

Stars: ✭ 16 (-94.24%)

Mutual labels: streaming-data

df data service

DataFibers Data Service

Stars: ✭ 31 (-88.85%)

Mutual labels: flink

transit

Massively real-time city transit streaming application

Stars: ✭ 20 (-92.81%)

Mutual labels: streaming-data

pravega-samples

Sample Applications for Pravega.

Stars: ✭ 43 (-84.53%)

Mutual labels: streaming-data

fb scraper

FBLYZE is a Facebook scraping system and analysis system.

Stars: ✭ 61 (-78.06%)

Mutual labels: flink

nestjs-file-streaming

NestJS File Streaming With MongoDB

Stars: ✭ 28 (-89.93%)

Mutual labels: streaming-data

spark learning

尚硅谷大数据Spark-2019版最新 Spark 学习

Stars: ✭ 42 (-84.89%)

Mutual labels: spark

tpch-spark

TPC-H queries in Apache Spark SQL using native DataFrames API

Stars: ✭ 63 (-77.34%)

Mutual labels: spark

FlinkTutorial

FlinkTutorial 专注大数据Flink流试处理技术。从基础入门、概念、原理、实战、性能调优、源码解析等内容，使用Java开发，同时含有Scala部分核心代码。欢迎关注我的博客及github。

Stars: ✭ 46 (-83.45%)

Mutual labels: flink

richflow

A Node.js and JavaScript synchronous data pipeline processing, data sharing and stream processing library. Actionable & Transformable Pipeline data processing.

Stars: ✭ 17 (-93.88%)

Mutual labels: streaming-data

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+784.53%)

Mutual labels: spark

akka-periscope

Akka plugin to collect various data about actors

Stars: ✭ 16 (-94.24%)

Mutual labels: akka

61-120 of 693 similar projects

‹

›

next*5