Dinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.

Stars: ✭ 1,535 (+2851.92%)

Mutual labels: flink, datalake

flink-connector-kudu

基于Apache-bahir-kudu-connector的flink-connector-kudu，支持Flink1.11.x DynamicTableSource/Sink，支持Range分区等

Stars: ✭ 40 (-23.08%)

Mutual labels: flink, flink-sql

MySqlCdc

MySQL/MariaDB binlog replication client for .NET

Stars: ✭ 71 (+36.54%)

Mutual labels: cdc, change-data-capture

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (-25%)

Mutual labels: datalake, spark-sql

dt-sql-parser

SQL Parsers for BigData, built with antlr4.

Stars: ✭ 135 (+159.62%)

Mutual labels: spark-sql, flink-sql

oracdc

Oracle database CDC (Change Data Capture)

Stars: ✭ 51 (-1.92%)

Mutual labels: cdc, change-data-capture

Realtime

Listen to your to PostgreSQL database in realtime via websockets. Built with Elixir.

Stars: ✭ 4,278 (+8126.92%)

Mutual labels: cdc, change-data-capture

litemall-dw

基于开源Litemall电商项目的大数据项目，包含前端埋点(openresty+lua)、后端埋点；数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化)，同时也包含了Azkaban的workflow。

Stars: ✭ 36 (-30.77%)

Mutual labels: flink, spark-sql

LarkMidTable

LarkMidTable 是一站式开源的数据中台，实现中台的基础建设，数据治理，数据开发，监控告警，数据服务，数据的可视化，实现高效赋能数据前台并提供数据服务的产品。

Stars: ✭ 873 (+1578.85%)

Mutual labels: flink, flink-sql

delta-lake-internals

The Internals of Delta Lake

Stars: ✭ 108 (+107.69%)

Mutual labels: delta-lake, deltalake

redis-microservices-demo

Microservice application with various Redis use-cases with RediSearch, RedisGraph and Streams. The data are synchronize between MySQL and Redis using Debezium as a CDC engine

Stars: ✭ 48 (-7.69%)

Mutual labels: cdc, debezium

TiBigData

TiDB connectors for Flink/Hive/Presto

Stars: ✭ 192 (+269.23%)

Mutual labels: flink, cdc

pgcapture

A scalable Netflix DBLog implementation for PostgreSQL

Stars: ✭ 94 (+80.77%)

Mutual labels: cdc, change-data-capture

OpenLogReplicator

Open Source Oracle database CDC written purely in C++. Reads transactions directly from database redo log files and streams in JSON or Protobuf format to: Kafka, RocketMQ, flat file, network stream (plain TCP/IP or ZeroMQ)

Stars: ✭ 112 (+115.38%)

Mutual labels: cdc, change-data-capture

OLAP-cube

is an hypercube of data

Stars: ✭ 23 (-55.77%)

Mutual labels: data-warehouse, data-warehousing

kafka-delta-ingest

A highly efficient daemon for streaming data from Kafka into Delta Lake

Stars: ✭ 139 (+167.31%)

Mutual labels: delta, deltalake

cdc

A library for performing Content-Defined Chunking (CDC) on data streams.

Stars: ✭ 18 (-65.38%)

Mutual labels: cdc

Archived-SANSA-Query

SANSA Query Layer

Stars: ✭ 31 (-40.38%)

Mutual labels: flink

Websockets-Vertx-Flink-Kafka

A simple request response cycle using Websockets, Eclipse Vert-x server, Apache Kafka, Apache Flink.

Stars: ✭ 14 (-73.08%)

Mutual labels: flink

flink-connectors

Apache Flink connectors for Pravega.

Stars: ✭ 84 (+61.54%)

Mutual labels: flink

spark-vcf

Spark VCF data source implementation for Dataframes

Stars: ✭ 15 (-71.15%)

Mutual labels: spark-sql

opaque-sql

An encrypted data analytics platform

Stars: ✭ 169 (+225%)

Mutual labels: spark-sql

albis

Albis: High-Performance File Format for Big Data Systems

Stars: ✭ 20 (-61.54%)

Mutual labels: spark-sql

google-sheets-etl

Live import all your Google Sheets to your data warehouse

Stars: ✭ 15 (-71.15%)

Mutual labels: data-warehouse

awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

Stars: ✭ 11,093 (+21232.69%)

Mutual labels: data-warehouse

Rnssp

A Signature R package for the National Syndromic Surveillance Program (NSSP) at the Centers for Disease Control and Prevention (CDC). A collection of tools, functions, and R Markdown templates that supports the Community of Practice of the NSSP.

Stars: ✭ 19 (-63.46%)

Mutual labels: cdc

northwind-dotnet

A full-stack .NET 6 Microservices build on Minimal APIs and C# 10

Stars: ✭ 77 (+48.08%)

Mutual labels: debezium

spring-projects

Some spring sample projects

Stars: ✭ 24 (-53.85%)

Mutual labels: debezium

deltaq

Fast and portable delta encoding for .NET in 100% safe, managed code.

Stars: ✭ 26 (-50%)

Mutual labels: delta

apache-flink-jdbc-streaming

Sample project for Apache Flink with Streaming Engine and JDBC Sink

Stars: ✭ 22 (-57.69%)

Mutual labels: flink

flink-training-troubleshooting

No description or website provided.

Stars: ✭ 41 (-21.15%)

Mutual labels: flink

seatunnel-example

seatunnel plugin developing examples.

Stars: ✭ 27 (-48.08%)

Mutual labels: flink

open-stream-processing-benchmark

This repository contains the code base for the Open Stream Processing Benchmark.

Stars: ✭ 37 (-28.85%)

Mutual labels: flink

HadoopDedup

🍉基于Hadoop和HBase的大规模海量数据去重

Stars: ✭ 27 (-48.08%)

Mutual labels: cdc

shopping-list

a PWA to note shopping list and see shopping history

Stars: ✭ 24 (-53.85%)

Mutual labels: hoodie

pg-logical-replication

PostgreSQL Logical Replication client for node.js

Stars: ✭ 56 (+7.69%)

Mutual labels: cdc

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-9.62%)

Mutual labels: spark-sql

DeltaUI

SwiftUI + CoreData user interface for DeltaCore & Friends.

Stars: ✭ 61 (+17.31%)

Mutual labels: delta

flink-learn

Learning Flink : Flink CEP,Flink Core,Flink SQL

Stars: ✭ 70 (+34.62%)

Mutual labels: flink

dw-vldb-samples

This is a top level repository for code examples related to Data Warehousing and Very Large Databases.

Stars: ✭ 32 (-38.46%)

Mutual labels: data-warehousing

logparser

Easy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...

Stars: ✭ 139 (+167.31%)

Mutual labels: flink

Tweet-Analysis-With-Kafka-and-Spark

A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.

Stars: ✭ 18 (-65.38%)

Mutual labels: spark-sql

tipoca-stream

Near real time cloud native data pipeline in AWS (CDC+Sink). Hosts code for RedshiftSink. RDS to RedshiftSink Pipeline with masking and reloading support.

Stars: ✭ 43 (-17.31%)

Mutual labels: cdc

smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Stars: ✭ 79 (+51.92%)

Mutual labels: deltalake

spark2-etl-examples

A project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0

Stars: ✭ 23 (-55.77%)

Mutual labels: spark-sql

pan-cortex-data-lake-python

Python idiomatic SDK for Cortex™ Data Lake.