Real-time ETL developed by Flink, data from MySQL to Greenplum. Use canal to parse the MySQL binlog, put it into kafka, use Flink to consume kafka and assemble the data into Greenplum, and more data sources and target sources will be added in the future.

Stars: ✭ 65 (-99.63%)

Mutual labels: flink

falcon

Mirror of Apache Falcon

Stars: ✭ 95 (-99.47%)

Mutual labels: big-data

FlinkTutorial

FlinkTutorial 专注大数据Flink流试处理技术。从基础入门、概念、原理、实战、性能调优、源码解析等内容，使用Java开发，同时含有Scala部分核心代码。欢迎关注我的博客及github。

Stars: ✭ 46 (-99.74%)

Mutual labels: flink

hadoop-data-ingestion-tool

OLAP and ETL of Big Data

Stars: ✭ 17 (-99.9%)

Mutual labels: big-data

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-99.92%)

Mutual labels: big-data

flink-prometheus-example

Example setup to demonstrate Prometheus integration of Apache Flink

Stars: ✭ 69 (-99.61%)

Mutual labels: flink

Larkmidtableweb

基于flink的分布式数据分析系统

Stars: ✭ 259 (-98.54%)

Mutual labels: flink

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (-99.47%)

Mutual labels: big-data

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-99.38%)

Mutual labels: big-data

predictionio-template-similar-product

PredictionIO Similar Product Engine Template (Scala-based parallelized engine)

Stars: ✭ 50 (-99.72%)

Mutual labels: big-data

Knowage Server

Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.

Stars: ✭ 276 (-98.45%)

Mutual labels: big-data

hotmap

WebGL Heatmap Viewer for Big Data and Bioinformatics

Stars: ✭ 13 (-99.93%)

Mutual labels: big-data

predictionio-template-java-ecom-recommender

PredictionIO E-Commerce Recommendation Engine Template (Java-based parallelized engine)

Stars: ✭ 36 (-99.8%)

Mutual labels: big-data

egis

Egis - a handy Ruby interface for AWS Athena

Stars: ✭ 38 (-99.79%)

Mutual labels: big-data

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (-98.55%)

Mutual labels: big-data

pytorch kmeans

Implementation of the k-means algorithm in PyTorch that works for large datasets

Stars: ✭ 38 (-99.79%)

Mutual labels: big-data

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-99.93%)

Mutual labels: big-data

big-sorter

Java library that sorts very large files of records by splitting into smaller sorted files and merging

Stars: ✭ 49 (-99.72%)

Mutual labels: big-data

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (-74.24%)

Mutual labels: big-data

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker