All Projects → ubisoft → Mobydq

ubisoft / Mobydq

Licence: apache-2.0
🐳 Tool to automate data quality checks on data pipelines

Projects that are alternatives of or similar to Mobydq

Maha
A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (-17.89%)
Mutual labels:  big-data
Pythondata
repo for code published on pythondata.com
Stars: ✭ 113 (-8.13%)
Mutual labels:  big-data
Hdfs Shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-4.88%)
Mutual labels:  big-data
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-13.01%)
Mutual labels:  big-data
Genie
Distributed Big Data Orchestration Service
Stars: ✭ 1,544 (+1155.28%)
Mutual labels:  big-data
Asakusafw
Asakusa Framework
Stars: ✭ 114 (-7.32%)
Mutual labels:  big-data
Graph sampling
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: ✭ 99 (-19.51%)
Mutual labels:  big-data
Report
自动化配置报表平台。演示地址http://58.87.112.247/report 账号 visitor密码123456
Stars: ✭ 123 (+0%)
Mutual labels:  big-data
Ambari
Mirror of Apache Ambari
Stars: ✭ 1,576 (+1181.3%)
Mutual labels:  big-data
Drill
Apache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+1216.26%)
Mutual labels:  big-data
Attic Predictionio Sdk Java
PredictionIO Java SDK
Stars: ✭ 107 (-13.01%)
Mutual labels:  big-data
Bigdataclass
Two-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-10.57%)
Mutual labels:  big-data
Amazon S3 Find And Forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-6.5%)
Mutual labels:  big-data
Mysql perf analyzer
MySQL performance monitoring and analysis.
Stars: ✭ 1,423 (+1056.91%)
Mutual labels:  big-data
Sigmf
The Signal Metadata Format Specification
Stars: ✭ 120 (-2.44%)
Mutual labels:  big-data
Vizuka
Explore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-18.7%)
Mutual labels:  big-data
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1128.46%)
Mutual labels:  big-data
Hazelcast Nodejs Client
Hazelcast IMDG Node.js Client
Stars: ✭ 124 (+0.81%)
Mutual labels:  big-data
Scala Spark Tutorial
Project for James' Apache Spark with Scala course
Stars: ✭ 121 (-1.63%)
Mutual labels:  big-data
Cmak
CMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+8472.36%)
Mutual labels:  big-data

MobyDQ

License tests

MobyDQ

MobyDQ is a tool for data engineering teams to automate data quality checks on their data pipeline, capture data quality issues and trigger alerts in case of anomaly, regardless of the data sources they use.

This tool has been inspired by an internal project developed at Ubisoft Entertainment in order to measure and improve the data quality of its Enterprise Data Platform. However, this open source version has been reworked to improve its design, simplify it and remove technical dependencies with commercial software.

Data pipeline

Getting Started

Skip the bla bla and run your data quality indicators by following the Getting Started page. The complete documentation is also available on Github Pages: https://ubisoft.github.io/mobydq.

Screenshots

Some screenshot of the web application to give you a taste of how it's like.

Demo

Run Dev

Run MobyDQ in development mode with the following command:

$ cd mobydq
$ docker-compose -f docker-compose.yml -f docker-compose.dev.yml up db graphql app nginx

Run Prod

Run MobyDQ in production mode with the following command. The argument -d is to run containers in the background as daemons.

$ cd mobydq
$ docker-compose up -d db graphql app nginx

Run Tests

You can run tests using the following commands:

$ cd mobydq

# Start test database instances
$ docker-compose -f docker-compose.yml -f docker-compose.test.yml up -d db graphql
$ docker-compose -f docker-compose.yml -f docker-compose.test.yml up -d db-cloudera db-mysql db-mariadb db-postgresql db-sql-server

# Run tests
$ docker-compose -f docker-compose.yml -f docker-compose.test.yml up test-db test-scripts

# Run linter
$ docker-compose -f docker-compose.yml -f docker-compose.test.yml build test-scripts test-lint-python
$ docker run --rm mobydq-test-lint-python pylint scripts test

Dependencies

Docker Images

Python Packages

JavaScript Packages

  • To be documented
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].