All Projects → kiwicom → contessa

kiwicom / contessa

Licence: MIT license
Easy way to define, execute and store quality rules for your data.

Programming Languages

python
139335 projects - #7 most used programming language
Mako
254 projects

Projects that are alternatives of or similar to contessa

versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+747.06%)
Mutual labels:  data-engineering, sqlite3, data-quality
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+241.18%)
Mutual labels:  data-engineering, data-quality
Great expectations
Always know what to expect from your data.
Stars: ✭ 5,808 (+34064.71%)
Mutual labels:  data-engineering, data-quality
Applied Ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+104747.06%)
Mutual labels:  data-engineering, data-quality
composer-plugin-qa
Comprehensive Plugin for composer to execute PHP Quality assurance Tools
Stars: ✭ 25 (+47.06%)
Mutual labels:  quality-assurance
CMU-15445
https://www.jianshu.com/nb/36265841
Stars: ✭ 220 (+1194.12%)
Mutual labels:  sqlite3
lrmr
Less-Resilient MapReduce framework for Go
Stars: ✭ 32 (+88.24%)
Mutual labels:  data-engineering
ionic2-PreDB
Simple Ionic 2+ with pre-populated database starter project
Stars: ✭ 14 (-17.65%)
Mutual labels:  sqlite3
flipper
Search/Recommendation engine and metainformation server for fanfiction net
Stars: ✭ 29 (+70.59%)
Mutual labels:  sqlite3
sqlite3
The fastest and correct module for SQLite3 in Deno.
Stars: ✭ 143 (+741.18%)
Mutual labels:  sqlite3
ndb.nim
A db_sqlite fork with a proper typing
Stars: ✭ 38 (+123.53%)
Mutual labels:  sqlite3
AC Management
A desktop application made with Python/Kivy for managing account related data of college students'
Stars: ✭ 23 (+35.29%)
Mutual labels:  sqlite3
big-data-engineering-indonesia
A curated list of big data engineering tools, resources and communities.
Stars: ✭ 26 (+52.94%)
Mutual labels:  data-engineering
papilo
DEPRECATED: Stream data processing micro-framework
Stars: ✭ 24 (+41.18%)
Mutual labels:  data-engineering
grafito
Portable, Serverless & Lightweight SQLite-based Graph Database in Arturo
Stars: ✭ 95 (+458.82%)
Mutual labels:  sqlite3
awesome-cypress
🎉 A curated list of awesome things related to Cypress
Stars: ✭ 274 (+1511.76%)
Mutual labels:  quality-assurance
laravel-database-manager
Make your database simple, easier and faster with vuejs.
Stars: ✭ 50 (+194.12%)
Mutual labels:  sqlite3
sqlite-kit
Non-blocking SQLite client library with SQL builder built on SwiftNIO
Stars: ✭ 51 (+200%)
Mutual labels:  sqlite3
sqlite-nio
Non-blocking wrapper for libsqlite3-dev using SwiftNIO
Stars: ✭ 33 (+94.12%)
Mutual labels:  sqlite3
electron-RxDB
RxDB is a high-performance, observable object store built on top of SQLite & intended for database-driven Electron applications.
Stars: ✭ 68 (+300%)
Mutual labels:  sqlite3

Contessa

docs-badge build-badge pypi-badge license-badge

Hello, welcome to Contessa!

Contessa is a Data Quality library that provides you an easy way to define, execute and store quality rules for your data.

Instead of writing a lot of sql queries that look almost exactly the same, we're aiming for more pragmatic approach - define rules programatically. This enables much more flexibility for the user and also for us as the creators of the lib.

We implement new Rules (incrementally) that should reflect Data Quality domain. From the start these are simple rules like - NOT_NULL, GT (greater than) etc. We want to build on these simple rules and provide more complex Data Quality checkers out-of-the-box.

Goals:

  • be database agnostic (to a reasonable degree), so you will define checks against any database (e.g. mysql vs. postgres) in the same way
  • automatize data quality results e.g. from postgres table to Datadog dashboard
  • programmatic approach to data-quality definition, which leads to:
    • dynamic composition of rules in a simple script using db or any 3rd party tool - e.g. take all tables, create NOT_NULl rule for all of them for each integer column
    • users can use special rules for data if needed, if not, they can go with generic solutions
    • automatizable testable parts of definitions when needed
  • easier maintenance when number of checks scales too fast :)

Full docs here

Quick Example

from contessa import ContessaRunner, NOT_NULL, GT, SQL
no_bags_sql = """
    SELECT CASE WHEN is_no_bags_booking = 'T' AND bags > 0 THEN false ELSE true END
    FROM {{table_fullname}};
"""
contessa = ContessaRunner("postgresql://:@localhost:5432")

RULES = [
    {
        "name" : "Status and market null check"
        "type": NOT_NULL,
        "columns": ["status", "market", "src", "dst"],
    },
    {
        "type": GT,
        "name": "gt_0_prices",
        "value": 0,
        "columns": ["initial_price", "turnover_before_refunds", ],
    },
    {
        "type": SQL,
        "name": "no_bags_sql",
        "sql": no_bags_sql,
        "description": "No bags booking should have bags = 0",
    },
]
contessa.run(
    raw_rules=RULES,
    check_table={"schema_name": "public", "table_name": "bookings"},
    result_table={"schema_name": "dq", "table_name": "my_table"},
)

This will result in table dq.quality_check_my_table. For model see :ref:`quality_check`

How to run tests

$ make test-up  # run postgres + app
$ make test args="/app/test -s"  # args for pytest
$ make test-down  # delete containers + volumes

In case of unit tests (you do not need db):

$ pytest test/unit/test_operator.py

How to add docs

$ pip3 install -r requirements-docs.txt
$ python3 watchdogs.py

It will make html files with sphinx and serve a local webserver so that you can check it out. It should also reload it :)

NOTE: If it doesn't work, build html manually. cd docs && make html

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].