All Projects → practo → tipoca-stream

practo / tipoca-stream

Licence: Apache-2.0 license
Near real time cloud native data pipeline in AWS (CDC+Sink). Hosts code for RedshiftSink. RDS to RedshiftSink Pipeline with masking and reloading support.

Programming Languages

go
31211 projects - #10 most used programming language
shell
77523 projects
Makefile
30231 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to tipoca-stream

Realtime
Listen to your to PostgreSQL database in realtime via websockets. Built with Elixir.
Stars: ✭ 4,278 (+9848.84%)
Mutual labels:  realtime, cdc
barracuda-style-transfer
Companion code for the Unity Style Transfer blog post, showcasing realtime style transfer using Barracuda.
Stars: ✭ 126 (+193.02%)
Mutual labels:  realtime
xavc rtmd2srt
Extract real time meta-data and GPS tracks from Sony XAVC video
Stars: ✭ 29 (-32.56%)
Mutual labels:  realtime
cdc
A library for performing Content-Defined Chunking (CDC) on data streams.
Stars: ✭ 18 (-58.14%)
Mutual labels:  cdc
azure-sql-db-change-stream-debezium
SQL Server Change Stream sample using Debezium
Stars: ✭ 74 (+72.09%)
Mutual labels:  cdc
pytest-mock-resources
Pytest Fixtures that let you actually test against external resource (Postgres, Mongo, Redshift...) dependent code.
Stars: ✭ 84 (+95.35%)
Mutual labels:  redshift
SVisual
Monitoring and record(save) of data for Arduino and STM32
Stars: ✭ 21 (-51.16%)
Mutual labels:  realtime
ngrx-realtime-app
Demo to build a realtime Angular app with a Vert.x backend and distributed event bus
Stars: ✭ 45 (+4.65%)
Mutual labels:  realtime
UPPERCASE
실시간성에 특화된 풀스택 프레임워크 ✨
Stars: ✭ 30 (-30.23%)
Mutual labels:  realtime
UnityRaymarching
raymarching experiment in unity
Stars: ✭ 73 (+69.77%)
Mutual labels:  realtime
Rin
Rin is a Redshift data Importer by SQS messaging.
Stars: ✭ 27 (-37.21%)
Mutual labels:  redshift
NasdaqCloudDataService-SDK-Java
Nasdaq Data Link provides a modern and efficient method of delivery for real-time exchange data and other financial information. This repository provides a Java SDK for developing applications using Nasdaq Data Link's real-time data.
Stars: ✭ 70 (+62.79%)
Mutual labels:  realtime
fastapi websocket pubsub
A fast and durable Pub/Sub channel over Websockets. FastAPI + WebSockets + PubSub == ⚡ 💪 ❤️
Stars: ✭ 255 (+493.02%)
Mutual labels:  realtime
Video-Engine-Dash
A Dash plugin for playing back video and optionally syncing video to timestamped CSV Data
Stars: ✭ 26 (-39.53%)
Mutual labels:  realtime
tideflow
Building extensible automation. Tideflow is a Realtime, open source workflows execution and monitorization web application.
Stars: ✭ 101 (+134.88%)
Mutual labels:  realtime
FaceRecog
Realtime Facial recognition system using Siamese neural network
Stars: ✭ 47 (+9.3%)
Mutual labels:  realtime
southpaw
⚾ Streaming left joins in Kafka for change data capture
Stars: ✭ 48 (+11.63%)
Mutual labels:  cdc
MobilePose
Light-weight Single Person Pose Estimator
Stars: ✭ 588 (+1267.44%)
Mutual labels:  realtime
starlake
Starlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Stars: ✭ 16 (-62.79%)
Mutual labels:  redshift
python-realtime-table
Building realtime table using Python and Channels
Stars: ✭ 12 (-72.09%)
Mutual labels:  realtime

tipoca-stream

CI Status


A near realtime cloud native data pipeline using Kafka, KafkaConnect, and RedshiftSink in AWS. RedshiftSink is a high performance, low overhead data loader for Redshift, open-sourced by Practo. It comes with a rich data masking support so you can create a universal data access in your organization while preserving your customer's privacy!

Release blog.

Tipoca Stream is a successor to an internal non-realtime datawarehousing project called Tipoca, which itself derives its name from Tipoca City - home of the Clones in the Star Wars universe.

Install

The pipeline is a combination of services deployed independently. This repo holds the code for the redshiftsink only.

  • RedshiftSink Please follow REDSHIFTSINK.md to install the RedshiftSink Kubernetes Operator. Creating the RedshiftSink resource installs Batcher and Loader pods in the cluster. These pods sinks the data from Kafka topics to Redshift, it takes care of the database migration when required. Redshiftsink has a rich masking support. It also supports table reloads in Redshift when masking configurations are modified in Github.
      kubectl get redshiftsink
  • Kafka Install Kafka using Strimzi CRDs or self hosted or managed kafka.
      kubectl get kafka
  • Producer Install Producer using Strimzi CRDs and Debezium. Creating the kafkaconnect and kafkaconnector creates a kafkaconnect pod in the cluster which start streaming the data from the source(MYSQL, RDS, etc..) to Kafka.
      kubectl get kafkaconnect
      kubectl get kafkaconnector

The project has pluggable libraries which can be composed to solve any other data pipeline use case.

Contribute

Please follow this to bring a change.

Thanks

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].