All Projects → peiliping → meepo

peiliping / meepo

Licence: other
异构存储数据迁移

Programming Languages

java
68154 projects - #9 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to meepo

columnify
Make record oriented data to columnar format.
Stars: ✭ 28 (-3.45%)
Mutual labels:  parquet
Parquet.jl
Julia implementation of Parquet columnar file format reader
Stars: ✭ 93 (+220.69%)
Mutual labels:  parquet
graphique
GraphQL service for arrow tables and parquet data sets.
Stars: ✭ 28 (-3.45%)
Mutual labels:  parquet
CRoaringUnityBuild
Dumps of CRoaring unity builds (for convenience)
Stars: ✭ 22 (-24.14%)
Mutual labels:  roaringbitmap
hadoop-etl-udfs
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-41.38%)
Mutual labels:  parquet
Spark
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
Stars: ✭ 55 (+89.66%)
Mutual labels:  parquet
albis
Albis: High-Performance File Format for Big Data Systems
Stars: ✭ 20 (-31.03%)
Mutual labels:  parquet
LarkMidTable
LarkMidTable 是一站式开源的数据中台,实现中台的 基础建设,数据治理,数据开发,监控告警,数据服务,数据的可视化,实现高效赋能数据前台并提供数据服务的产品。
Stars: ✭ 873 (+2910.34%)
Mutual labels:  datax
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-34.48%)
Mutual labels:  parquet
parquet2
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Stars: ✭ 157 (+441.38%)
Mutual labels:  parquet
databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Stars: ✭ 19 (-34.48%)
Mutual labels:  parquet
odbc2parquet
A command line tool to query an ODBC data source and write the result into a parquet file.
Stars: ✭ 95 (+227.59%)
Mutual labels:  parquet
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-51.72%)
Mutual labels:  sqoop
common-datax
基于DataX的通用数据同步微服务,一个Restful接口搞定所有通用数据同步
Stars: ✭ 51 (+75.86%)
Mutual labels:  datax
DataXServer
为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer) 分布式运行(DataX on YARN) 功能
Stars: ✭ 130 (+348.28%)
Mutual labels:  datax
terraform-aws-kinesis-firehose
This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.
Stars: ✭ 25 (-13.79%)
Mutual labels:  parquet
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-17.24%)
Mutual labels:  parquet
experiments
Code examples for my blog posts
Stars: ✭ 21 (-27.59%)
Mutual labels:  parquet
cloud
云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件
Stars: ✭ 48 (+65.52%)
Mutual labels:  sqoop
parquet-usql
A custom extractor designed to read parquet for Azure Data Lake Analytics
Stars: ✭ 13 (-55.17%)
Mutual labels:  parquet

Meepo是一个轻量级的数据迁移工具,主要针对Mysql、ParquetFile之间的数据交换场景。

当然也定制了一些扩展,比如Redis、ElasticSearch等。

Meepo主要是用来解决如下几个问题:

1、Mysql表的同步,持续读取原表的新增数据,写入一个定制化的新表,有一些简单的数据加工。

这个需求也有很多公司是基于otter、cannal来做的,meepo和datax原理差不多,基于JDBC。

2、快速复制一张Mysql表,以最快的速度完成一个镜像的拷贝,可适当取舍列,主要用于测试需求。

为了快速写入Mysql,Meepo还是做了很多细致的性能优化工作的,基本上可以满足绝大多数需求了。

3、将在线库的数据生成Parquet,并写入到HDFS上,或者生成本地文件,方便数据的传递。

跟Sqoop功能差不多,但是Sqoop基于Yarn的MR不太好控制,而且依赖有些重。

4、两张Mysql表的比对,目前只能针对主键ID,进行差异比较,找到丢失的数据行。

主要是基于Bitmap,在有限的内存空间里,快速比对数据差异。

5、基于Plugin的定制开发,Meepo默认提供了一些Plugin,也允许plugin组合使用和自定义。

默认提供的插件能自动处理字段类型的差异,能够完成简单的Join计算。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].