All Projects → xmlking → Cdc Kafka Hadoop

xmlking / Cdc Kafka Hadoop

MySQL to NoSQL real time dataflow

Programming Languages

java
68154 projects - #9 most used programming language
groovy
2714 projects

Projects that are alternatives of or similar to Cdc Kafka Hadoop

Szt Bigdata
深圳地铁大数据客流分析系统🚇🚄🌟
Stars: ✭ 826 (+6253.85%)
Mutual labels:  kafka, hadoop, mysql
Devops Bash Tools
550+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Kafka, Docker, APIs, Hadoop, SQL, PostgreSQL, MySQL, Hive, Impala, Travis CI, Jenkins, Concourse, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, .tmux.conf, .psqlrc ...
Stars: ✭ 226 (+1638.46%)
Mutual labels:  kafka, hadoop, mysql
Nagios Plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+7592.31%)
Mutual labels:  kafka, hadoop, mysql
Javakeeper
✍️ Java 工程师必备架构体系知识总结:涵盖分布式、微服务、RPC等互联网公司常用架构,以及数据存储、缓存、搜索等必备技能
Stars: ✭ 502 (+3761.54%)
Mutual labels:  kafka, mysql
Pdf
编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+92276.92%)
Mutual labels:  hadoop, mysql
School Of Sre
At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
Stars: ✭ 5,141 (+39446.15%)
Mutual labels:  hadoop, mysql
Kafka Connect Hdfs
Kafka Connect HDFS connector
Stars: ✭ 400 (+2976.92%)
Mutual labels:  kafka, hadoop
Pmacct
pmacct is a small set of multi-purpose passive network monitoring tools [NetFlow IPFIX sFlow libpcap BGP BMP RPKI IGP Streaming Telemetry].
Stars: ✭ 677 (+5107.69%)
Mutual labels:  kafka, mysql
Books Recommendation
程序员进阶书籍(视频),持续更新(Programmer Books)
Stars: ✭ 558 (+4192.31%)
Mutual labels:  kafka, mysql
Demo Scene
👾Scripts and samples to support Confluent Demos and Talks. ⚠️Might be rough around the edges ;-) 👉For automated tutorials and QA'd code, see https://github.com/confluentinc/examples/
Stars: ✭ 806 (+6100%)
Mutual labels:  kafka, mysql
Quarkus Microservices Poc
Very simplified shop sales system made in a microservices architecture using quarkus
Stars: ✭ 16 (+23.08%)
Mutual labels:  kafka, architecture
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+6492.31%)
Mutual labels:  kafka, hadoop
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+46115.38%)
Mutual labels:  kafka, hadoop
Cookbook
🎉🎉🎉JAVA高级架构师技术栈==任何技能通过 “刻意练习” 都可以达到融会贯通的境界,就像烹饪一样,这里有一份JAVA开发技术手册,只需要增加自己练习的次数。🏃🏃🏃
Stars: ✭ 428 (+3192.31%)
Mutual labels:  kafka, mysql
Bigdata
💎🔥大数据学习笔记
Stars: ✭ 488 (+3653.85%)
Mutual labels:  hadoop, mysql
Go Clean Arch
Go (Golang) Clean Architecture based on Reading Uncle Bob's Clean Architecture
Stars: ✭ 5,128 (+39346.15%)
Mutual labels:  mysql, architecture
Javapdf
🍣100本 Java电子书 技术书籍PDF(以下载阅读为荣,以点赞收藏为耻)
Stars: ✭ 609 (+4584.62%)
Mutual labels:  hadoop, mysql
Kudo
Kubernetes Universal Declarative Operator (KUDO)
Stars: ✭ 849 (+6430.77%)
Mutual labels:  kafka, mysql
Workflow
C++ Parallel Computing and Asynchronous Networking Engine
Stars: ✭ 6,680 (+51284.62%)
Mutual labels:  kafka, mysql
Gnomock
Test your code without writing mocks with ephemeral Docker containers 📦 Setup popular services with just a couple lines of code ⏱️ No bash, no yaml, only code 💻
Stars: ✭ 398 (+2961.54%)
Mutual labels:  kafka, mysql

CDC Hadoop Dataflow

A low latency, multi-tenant Change Data Capture(CDC) pipeline to continuously replicate data from OLTP(MySQL) to OLAP(NoSQL) systems with no impact to the source.

This project demonstrate how to build dataflow pipeline to move data from O]operational databases(MySQL, Oracle) to analytics databases(Hadoop, MongoDB, MarkLogic) in real-time using Change Data Capture(CDC), Kafka and tools like Apache NiFi, Kafka Streams or Spark to process and ingest data into Hadoop.

cdc architecture

Features

  1. Capture changes from many Data Sources and types.
  2. Feed data to many client types (real-time, slow/catch-up, full bootstrap).
  3. Multi-tenant: can contain data from many different databases, support multiple consumers.
  4. Non-intrusive architecture for change capture.
  5. Both batch and near real time delivery.
  6. Isolate fast consumers from slow consumers.
  7. Isolate sources from consumers
    1. Schema changes
    2. Physical layout changes
    3. Speed mismatch
  8. Change filtering
    1. Filtering of database changes at the database level, schema level, table level, and row/column level.
  9. Buffer change records in Kafka for flexible consumption from an arbitrary time point in the change stream including full bootstrap capability of the entire data.
  10. Guaranteed in-commit-order and at-least-once delivery with high availability (at least once vs. exactly once)
  11. Resilience and Recoverability
  12. Schema-awareness

Setup

Install and Run MySQL

Install source MySQL database and configure it with row based replication as per instructions.

Install and Run Kafka

Follow the instructions

Install and Run Maxwell

cd cdc/maxwell
# curl -L -0 https://github.com/zendesk/maxwell/releases/download/v1.0.0/maxwell-1.1.2.tar.gz | tar --strip-components=1 -zx -C .
curl -L -0 https://github.com/xmlking/maxwell/releases/download/1.1.2.1/maxwell-1.1.2.1-kafka-connect.tar.gz | tar --strip-components=1 -zx -C .

Run

cd cdc/maxwell

  1. Run with stdout producer (for testing only)

    bin/maxwell --user='maxwell' --password='XXXXXX' --host='127.0.0.1' --producer=stdout

  2. Run with kafka producer

    bin/maxwell

Test

Manual Testing

If all goes well you'll see maxwell replaying your inserts:

mysql -u root -p

mysql> CREATE TABLE test.shop
       (
         id BIGINT(20) NOT NULL AUTO_INCREMENT,
         version BIGINT(20) NOT NULL,
         name VARCHAR(255) NOT NULL,
         owner VARCHAR(255) NOT NULL,
         phone_number VARCHAR(255) NOT NULL,
         primary key (id, name)
       );
mysql> INSERT INTO test.shop (version, name, owner, phone_number) values (0, 'aaa', 'bbb', '3331114444');
Query OK, 1 row affected (0.02 sec)

(maxwell)
{"database":"test","table":"shop","pk.id":4,"pk.name":"aaa"}
{"database":"test","table":"shop","type":"insert","ts":1458510224,"xid":33531,"commit":true,"data":{"owner":"bbb","name":"aaa","phone_number":"3331114444","id":4,"version":0}}

Testing via Grails App

You can also use testApp to generate load.

Reference

  1. Maxwell's Daemon
  2. LinkedIn: Creating A Low Latency Change Data Capture System With Databus
  3. Introducing Maxwell, a mysql-to-kafka binlog processor
  4. Martin Kleppman's blog: Using logs to build a solid data infrastructure
  5. Bottled Water: Real-time integration of PostgreSQL and Kafka
  6. debezium-examples
  7. Tutorial on using NiFi's built-in CDC - 3 parts
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].