All Projects → JeasonPeople → mriya

JeasonPeople / mriya

Licence: Apache-2.0 License
Real-time ETL developed by Flink, data from MySQL to Greenplum. Use canal to parse the MySQL binlog, put it into kafka, use Flink to consume kafka and assemble the data into Greenplum, and more data sources and target sources will be added in the future.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to mriya

flink-connectors
Apache Flink connectors for Pravega.
Stars: ✭ 84 (+29.23%)
Mutual labels:  flink
dockerize-and-ansible
🐳 Build & Deploy the containerized Dev & Prod Env
Stars: ✭ 20 (-69.23%)
Mutual labels:  greenplum
flink-k8s-operator
An example of building kubernetes operator (Flink) using Abstract operator's framework
Stars: ✭ 28 (-56.92%)
Mutual labels:  flink
emma
A quotation-based Scala DSL for scalable data analysis.
Stars: ✭ 61 (-6.15%)
Mutual labels:  flink
FlinkTutorial
FlinkTutorial 专注大数据Flink流试处理技术。从基础入门、概念、原理、实战、性能调优、源码解析等内容,使用Java开发,同时含有Scala部分核心代码。欢迎关注我的博客及github。
Stars: ✭ 46 (-29.23%)
Mutual labels:  flink
2018-flink-forward-china
Flink Forward China 2018 第一届记录,视频记录 | 文档记录 | 不仅仅是流计算 | More than streaming
Stars: ✭ 25 (-61.54%)
Mutual labels:  flink
flink-learn
Learning Flink : Flink CEP,Flink Core,Flink SQL
Stars: ✭ 70 (+7.69%)
Mutual labels:  flink
FlinkForward201709
Flink Forward 201709
Stars: ✭ 43 (-33.85%)
Mutual labels:  flink
fb scraper
FBLYZE is a Facebook scraping system and analysis system.
Stars: ✭ 61 (-6.15%)
Mutual labels:  flink
fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-69.23%)
Mutual labels:  flink
cassandra.realtime
Different ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink
Stars: ✭ 25 (-61.54%)
Mutual labels:  flink
review-notes
团队分享学习、复盘笔记资料共享。Java、Scala、Flink...
Stars: ✭ 27 (-58.46%)
Mutual labels:  flink
LarkMidTable
LarkMidTable 是一站式开源的数据中台,实现中台的 基础建设,数据治理,数据开发,监控告警,数据服务,数据的可视化,实现高效赋能数据前台并提供数据服务的产品。
Stars: ✭ 873 (+1243.08%)
Mutual labels:  flink
Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Stars: ✭ 52 (-20%)
Mutual labels:  flink
flink-prometheus-example
Example setup to demonstrate Prometheus integration of Apache Flink
Stars: ✭ 69 (+6.15%)
Mutual labels:  flink
flink-training-troubleshooting
No description or website provided.
Stars: ✭ 41 (-36.92%)
Mutual labels:  flink
df data service
DataFibers Data Service
Stars: ✭ 31 (-52.31%)
Mutual labels:  flink
np-flink
flink详细学习实践
Stars: ✭ 26 (-60%)
Mutual labels:  flink
hadoop-data-ingestion-tool
OLAP and ETL of Big Data
Stars: ✭ 17 (-73.85%)
Mutual labels:  greenplum
litemall-dw
基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (-44.62%)
Mutual labels:  flink

mriya (运输机) ✈️

介绍

使用Flink开发的实时ETL,数据从MySQL到Greenplum。使用canal解析MySQL的binlog,投放进kafka,使用Flink消费kafka并把数据组装进Greenplum,后续将会添加更多的数据源和目标源。

工作流程

工作流程

  1. 利用canal解析MySQL的binary log,并将解析的log投入kafka中。
  2. 使用mriya消费kafka中的消息,还原MySQL 的增删改。
  3. 将MySQL的增删改转义成目标源的增删改语句

特性

  1. 基于binlog的近实时同步ETL
  2. 支持自动化表创建,自动化DDL变更同步
  3. 使用nacos注册中心同步,变更配置不需要重启
  4. 后续添加支持多目标源

MySql --> PostGreSql/Greenplum(使用delete+copy方式):

  1. 支持近实时级别的数据增删改

  2. 支持自动创建表

CREATE TABLE [IF NOT EXISTS] tbl_name create_definition: {...} 
  1. 支持MySql表结构的变更
ALTER TABLE tbl_name

  | ADD [COLUMN] col_name column_definition
  
  | ADD [COLUMN] (col_name column_definition,...) 
  
  | DROP [COLUMN] col_name 
  
  | MODIFY [COLUMN] col_name column_definition
  
  1. 支持主键的修改

  2. 删除表

  3. 修改表名

MySql --> Apache Kudu(待开发):

工作原理

  1. 从kafka中获取canal解析完成的MySQLBinary log。
  2. 使用Flink的keyBy对targetTable进行分组,并使用时间窗口。
  3. 自定义一个trigger,触发事件为解析到DDL语句。
  4. 步骤2和步骤3组成,时间窗口+自定义trigger组合使用,如果没有DDL语句则根据时间进行滚动,如果存在DDL语句数据立即滚动。
  5. 定义aggregate,将同一张表的数据进行合并去重
  6. 自定义Sink,定义GreenplumSink或者其他目标数据源。

docker 极速体验

git clone https://github.com/JeasonPeople/mriya.git
cd mriya/docker-compose/
docker-compose up
  1. 访问http://docker-ip:8848/nacos修改配置(默认账号nacos/nacos) 在public下新增Properties文件, Data ID=MRIYA, group=MRIYA_GROUP
mriya.source.kafka.bootstrap.servers=kafka:9092
mriya.source.kafka.zookeeper.connect=zk:2181
mriya.source.kafka.group.id=dw-etl-prod-gp6
mriya.source.kafka.auto.offset.reset=earliest
mriya.source.kafka.topic=mriya

mriya.target.datasource.type=greenplum
mriya.target.datasource.url=jdbc:postgresql://greenplum:5432/mriya?serverTimezone=GMT+8
mriya.target.datasource.schema=dw_ods
mriya.target.datasource.username=gpadmin
mriya.target.datasource.password=pivotal
# 支持freemarker语法,${table}为必写项
mriya.table.name.template=${topic}_${database}_${table}

# psql -d template1 -c "alter user gpadmin password 'pivotal'"
# mriya.message.filer=${topic}-${database}-${table}
# mriya.message.filer=mes-accounting_bak-*
  1. 使用gpadmin账号连接greenplum创建database以及schema(默认账号root/pivotal gpadmin/pivotal)
CREATE DATABASE "mriya";
CREATE SCHEMA "dw_ods";
  1. 访问http://docker-ip:8081/#/submit提交jar并运行jar

  2. 使用连接工具连接MySql(默认账号root/Mriya@Mriya)运行sql

CREATE DATABASE `mriya`;
CREATE TABLE `mriya`.`table_1`  (
  `k1` int(10) NOT NULL AUTO_INCREMENT COMMENT 'primary key',
  `c1` varchar(255) NULL,
  `c2` varchar(255) NULL,
  `c3` varchar(255) NULL,
  `c4` datetime(2) NULL,
  PRIMARY KEY (`k1`)
);

安装教程

  1. 安装MySql
  2. 安装canal
  3. 安装kafka
  4. 安装zookeeper

1-4 安装教程(https://github.com/alibaba/canal/wiki)

  1. 安装配置中心nacos

nacos 安装教程(https://nacos.io/zh-cn/docs/deployment.html)

  1. 安装Flink

单机版安装(https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/cluster_setup.html#starting-flink)

  1. 安装Greenplum

docker安装Greenplum

docker pull datagrip/greenplum
docker run -it -p 5432:5432 datagrip/greenplum

用户名: gpadmin 密码: pivotal 用户名: root 密码: pivotal

使用说明

  1. 使用源码编译
git clone https://github.com/JeasonPeople/mriya.git
cd mriya
mvn install -Dmaven.test.skip=true
cd mriya-flink/target

将打包好的jar包通过Flink Web上传并执行

同步速度

同步速度 同步速度

添加管理员微信进入技术群

工作流程

关注公众号

工作流程

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].