All Projects → uber → Storagetapper

uber / Storagetapper

Licence: mit
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Storagetapper

Pmacct
pmacct is a small set of multi-purpose passive network monitoring tools [NetFlow IPFIX sFlow libpcap BGP BMP RPKI IGP Streaming Telemetry].
Stars: ✭ 677 (+191.81%)
Mutual labels:  json, kafka, avro, mysql, postgresql
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-75%)
Mutual labels:  s3, json, avro, hdfs
Synch
Sync data from the other DB to ClickHouse(cluster)
Stars: ✭ 200 (-13.79%)
Mutual labels:  kafka, mysql, postgresql, clickhouse
Spring Boot 2.x Examples
Spring Boot 2.x code examples
Stars: ✭ 104 (-55.17%)
Mutual labels:  kafka, mysql, postgresql
Octosql
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
Stars: ✭ 2,579 (+1011.64%)
Mutual labels:  json, mysql, postgresql
Udacity Data Engineering
Udacity Data Engineering Nano Degree (DEND)
Stars: ✭ 89 (-61.64%)
Mutual labels:  s3, etl, postgresql
Flume Canal Source
Flume NG Canal source
Stars: ✭ 56 (-75.86%)
Mutual labels:  kafka, mysql, hdfs
Schema Registry
Confluent Schema Registry for Kafka
Stars: ✭ 1,647 (+609.91%)
Mutual labels:  json, kafka, avro
Flink Learning
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+4804.31%)
Mutual labels:  kafka, mysql, clickhouse
Devops Bash Tools
550+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Kafka, Docker, APIs, Hadoop, SQL, PostgreSQL, MySQL, Hive, Impala, Travis CI, Jenkins, Concourse, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, .tmux.conf, .psqlrc ...
Stars: ✭ 226 (-2.59%)
Mutual labels:  kafka, mysql, postgresql
Kafka Connect Mongodb
**Unofficial / Community** Kafka Connect MongoDB Sink Connector - Find the official MongoDB Kafka Connector here: https://www.mongodb.com/kafka-connector
Stars: ✭ 137 (-40.95%)
Mutual labels:  json, kafka, avro
Open Bank Mark
A bank simulation application using mainly Clojure, which can be used to end-to-end test and show some graphs.
Stars: ✭ 81 (-65.09%)
Mutual labels:  kafka, avro, postgresql
Luigi Warehouse
A luigi powered analytics / warehouse stack
Stars: ✭ 72 (-68.97%)
Mutual labels:  etl, mysql, postgresql
Csv2db
The CSV to database command line loader
Stars: ✭ 102 (-56.03%)
Mutual labels:  etl, mysql, postgresql
Transformalize
Configurable Extract, Transform, and Load
Stars: ✭ 125 (-46.12%)
Mutual labels:  etl, mysql, postgresql
Linq2db
Linq to database provider.
Stars: ✭ 2,211 (+853.02%)
Mutual labels:  etl, mysql, postgresql
Bireme
Bireme is an incremental synchronization tool for the Greenplum / HashData data warehouse
Stars: ✭ 110 (-52.59%)
Mutual labels:  kafka, mysql, postgresql
Treefrog Framework
TreeFrog Framework : High-speed C++ MVC Framework for Web Application
Stars: ✭ 885 (+281.47%)
Mutual labels:  json, mysql, postgresql
Kiba Plus
Kiba enhancement for Ruby ETL.
Stars: ✭ 47 (-79.74%)
Mutual labels:  etl, mysql, postgresql
Datax
DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (-50%)
Mutual labels:  etl, mysql, clickhouse

StorageTapper

Overview

Build Status Go Report Card codecov

StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service.

Storagetapper is deployed in production at Uber and used to produce snapshot and realtime changed data of thousands of MySQL tables across multiple datacenters.

It is also used as a backup service to snapshot hundreds of terrabytes of Schemaless data to HDFS and S3 with optional asymmetric encryption and compression.

It reads data from source transforms according to the specified event format and produces data to destination.

Supported event sources:

  • MySQL
  • Schemaless

Supported event destinations:

  • Kafka
  • HDFS
  • S3
  • Local file
  • MySQL (experimental)
  • Postgres (experimental)
  • Clickhouse (experimental)

Supported event formats:

  • Avro
  • JSON
  • MsgPack
  • SQL

Storagetapper keeps it jobs state in MySQL database and automatically distribute jobs between configured number of workers.

It is also aware of node roles and takes snapshot from the slave nodes in order to reduce load on master nodes. It can also optionally further throttles the reads. Binlogs are streamed from master nodes for better SLAs.

Service is dynamically configurable through RESTful API or builtin UI.

Build & Install

Debian & Ubuntu

cd storagetapper
make deb && dpkg -i ../storagetapper_1.0_amd64.deb

Others

cd storagetapper
make && make install

Development

Linux

/bin/bash scripts/install_deps.sh # install all dependencies: MySQL, Kafka, HDFS, S3, ...
make test # run all tests
GO111MODULE=on TEST_PARAM="-test.run=TestLocalBasic" /bin/bash scripts/run_tests.sh ./pipe # individual test

Non Linux

make test-env
$ make test

Configuration

Storagetapper loads configuration from the following files and location in the given order:

    /etc/storagetapper/base.yaml
    /etc/storagetapper/production.yaml
    $(HOME)/base.yaml
    $(HOME)/production.yaml
    $(STORAGETAPPER_CONFIG_DIR)/base.yaml
    $(STORAGETAPPER_CONFIG_DIR)/production.yaml

Available options described in Options section

License

This software is licensed under the MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].