Alternatives and detailed information of meepo

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

Stars: ✭ 55 (+89.66%)

Mutual labels: parquet

albis

Albis: High-Performance File Format for Big Data Systems

Stars: ✭ 20 (-31.03%)

Mutual labels: parquet

LarkMidTable

LarkMidTable 是一站式开源的数据中台，实现中台的基础建设，数据治理，数据开发，监控告警，数据服务，数据的可视化，实现高效赋能数据前台并提供数据服务的产品。

Stars: ✭ 873 (+2910.34%)

Mutual labels: datax

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-34.48%)

Mutual labels: parquet

parquet2

Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

Stars: ✭ 157 (+441.38%)

Mutual labels: parquet

databricks-notebooks

Collection of Databricks and Jupyter Notebooks

Stars: ✭ 19 (-34.48%)

Mutual labels: parquet

odbc2parquet

A command line tool to query an ODBC data source and write the result into a parquet file.

Stars: ✭ 95 (+227.59%)

Mutual labels: parquet

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-51.72%)

Mutual labels: sqoop

common-datax

基于DataX的通用数据同步微服务，一个Restful接口搞定所有通用数据同步

Stars: ✭ 51 (+75.86%)

Mutual labels: datax

DataXServer

为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用（ThriftServer，HttpServer）分布式运行（DataX on YARN）功能

Stars: ✭ 130 (+348.28%)

Mutual labels: datax

terraform-aws-kinesis-firehose

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

Stars: ✭ 25 (-13.79%)

Mutual labels: parquet

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-17.24%)

Mutual labels: parquet

experiments

Code examples for my blog posts

Stars: ✭ 21 (-27.59%)

Mutual labels: parquet

cloud

云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件

Stars: ✭ 48 (+65.52%)

Mutual labels: sqoop

parquet-usql

A custom extractor designed to read parquet for Azure Data Lake Analytics

Stars: ✭ 13 (-55.17%)

Mutual labels: parquet

View All Similar Projects ➔

Meepo是一个轻量级的数据迁移工具，主要针对Mysql、ParquetFile之间的数据交换场景。

当然也定制了一些扩展，比如Redis、ElasticSearch等。

Meepo主要是用来解决如下几个问题：

1、Mysql表的同步，持续读取原表的新增数据，写入一个定制化的新表，有一些简单的数据加工。

这个需求也有很多公司是基于otter、cannal来做的，meepo和datax原理差不多，基于JDBC。

2、快速复制一张Mysql表，以最快的速度完成一个镜像的拷贝，可适当取舍列，主要用于测试需求。

为了快速写入Mysql，Meepo还是做了很多细致的性能优化工作的，基本上可以满足绝大多数需求了。

3、将在线库的数据生成Parquet，并写入到HDFS上，或者生成本地文件，方便数据的传递。

跟Sqoop功能差不多，但是Sqoop基于Yarn的MR不太好控制，而且依赖有些重。

4、两张Mysql表的比对，目前只能针对主键ID，进行差异比较，找到丢失的数据行。

主要是基于Bitmap，在有限的内存空间里，快速比对数据差异。

5、基于Plugin的定制开发，Meepo默认提供了一些Plugin，也允许plugin组合使用和自定义。

默认提供的插件能自动处理字段类型的差异，能够完成简单的Join计算。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

peiliping / meepo

Programming Languages

Labels

Projects that are alternatives of or similar to meepo