OHDSI / DataQualityDashboard

Licence: Apache-2.0 license

A tool to help improve data quality standards in observational data science.

Programming Languages

javascript

184084 projects - #8 most used programming language

7636 projects

Projects that are alternatives of or similar to DataQualityDashboard

Great expectations

Always know what to expect from your data.

Stars: ✭ 5,808 (+9267.74%)

Mutual labels: data-quality

penguin-datalayer-collect

A data layer quality monitoring and validation module, this solution is part of the Raft Suite ecosystem.

Stars: ✭ 19 (-69.35%)

Mutual labels: data-quality

check-engine

Data validation library for PySpark 3.0.0

Stars: ✭ 29 (-53.23%)

Mutual labels: data-quality

Applied Ml

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

Stars: ✭ 17,824 (+28648.39%)

Mutual labels: data-quality

soda-spark

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Stars: ✭ 58 (-6.45%)

Mutual labels: data-quality

great expectations action

A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.

Stars: ✭ 66 (+6.45%)

Mutual labels: data-quality

qamd

QAMyData, a data quality assurance tool for SPSS, STATA, SAS and CSV files.

Stars: ✭ 16 (-74.19%)

Mutual labels: data-quality

hooqu

hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python

Stars: ✭ 17 (-72.58%)

Mutual labels: data-quality

contessa

Easy way to define, execute and store quality rules for your data.

Stars: ✭ 17 (-72.58%)

Mutual labels: data-quality

osm-data-classification

Migrated to: https://gitlab.com/Oslandia/osm-data-classification

Stars: ✭ 23 (-62.9%)

Mutual labels: data-quality

hive compared bq

hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.

Stars: ✭ 27 (-56.45%)

Mutual labels: data-quality

NBi

NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…

Stars: ✭ 102 (+64.52%)

Mutual labels: data-quality

leila

Librería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co

Stars: ✭ 56 (-9.68%)

Mutual labels: data-quality

Pandas Profiling

Create HTML profiling reports from pandas DataFrame objects

Stars: ✭ 8,329 (+13333.87%)

Mutual labels: data-quality

datatile

A library for managing, validating, summarizing, and visualizing data.

Stars: ✭ 419 (+575.81%)

Mutual labels: data-quality

Django-Data-quality-system

数据治理、数据质量检核/监控平台（Django+jQuery+MySQL）

Stars: ✭ 143 (+130.65%)

Mutual labels: data-quality

dqlab-career-track

A collection of scripts written to complete DQLab Data Analyst Career Track 📊

Stars: ✭ 53 (-14.52%)

Mutual labels: data-quality

TracIn

Implementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)

Stars: ✭ 165 (+166.13%)

Mutual labels: data-quality

Data-Quality-Analysis

The PEDSnet Data Quality Assessment Toolkit (OMOP CDM)

Stars: ✭ 19 (-69.35%)

Mutual labels: data-quality

versatile-data-kit

Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.

Stars: ✭ 144 (+132.26%)

Mutual labels: data-quality

View All Similar Projects ➔

DataQualityDashboard

The goal of the Data Quality Dashboard (DQD) project is to design and develop an open-source tool to expose and evaluate observational data quality.

Introduction

This package will run a series of data quality checks against an OMOP CDM instance (currently supports v5.3.1 and v5.2.2). It systematically runs the checks, evaluates the checks against some pre-specified threshold, and then communicates what was done in a transparent and easily understandable way.

Overview

The quality checks were organized according to the Kahn Framework¹ which uses a system of categories and contexts that represent strategies for assessing data quality. For an introduction to the kahn framework please click here.

Using this framework, the Data Quality Dashboard takes a systematic-based approach to running data quality checks. Instead of writing thousands of individual checks, we use “data quality check types”. These “check types” are more general, parameterized data quality checks into which OMOP tables, fields, and concepts can be substituted to represent a singular data quality idea. For example, one check type might be written as

The number and percent of records with a value in the cdmFieldName field of the cdmTableName table less than plausibleValueLow.

This would be considered an atemporal plausibility verification check because we are looking for implausibly low values in some field based on internal knowledge. We can use this check type to substitute in values for cdmFieldName, cdmTableName, and plausibleValueLow to create a unique data quality check. If we apply it to PERSON.YEAR_OF_BIRTH here is how that might look:

The number and percent of records with a value in the year_of_birth field of the PERSON table less than 1850.

And, since it is parameterized, we can similarly apply it to DRUG_EXPOSURE.days_supply:

The number and percent of records with a value in the days_supply field of the DRUG_EXPOSURE table less than 0.

Version 1 of the tool includes 20 different check types organized into Kahn contexts and categories. Additionally, each data quality check type is considered either a table check, field check, or concept-level check. Table-level checks are those evaluating the table at a high-level without reference to individual fields, or those that span multiple event tables. These include checks making sure required tables are present or that at least some of the people in the PERSON table have records in the event tables. Field-level checks are those related to specific fields in a table. The majority of the check types in version 1 are field-level checks. These include checks evaluating primary key relationship and those investigating if the concepts in a field conform to the specified domain. Concept-level checks are related to individual concepts. These include checks looking for gender-specific concepts in persons of the wrong gender and plausible values for measurement-unit pairs. For a detailed description and definition of each check type please click here.

After systematically applying the 20 check types to an OMOP CDM version approximately 3,351 individual data quality checks are resolved, run against the database, and evaluated based on a pre-specified threshold. The R package then creates a json object that is read into an RShiny application to view the results.

Features

Utilizes configurable data check thresholds
Analyzes data in the OMOP Common Data Model format for all data checks
Produces a set of data check results with supplemental investigation assets.

Technology

DataQualityDashboard is an R package

System Requirements

Requires R (version 3.2.2 or higher). Requires DatabaseConnector and SqlRender.

Support

Developer questions/comments/feedback: OHDSI Forum
We use the GitHub issue tracker for all bugs/issues/enhancements

License

DataQualityDashboard is licensed under Apache License 2.0

Development status

V1.0 ready for use.

Acknowledgements

This project is supported in part through the National Science Foundation grant IIS 1251151.

1 Kahn, M.G., et al., A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC), 2016. 4(1): p. 1244. ↩

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

OHDSI / DataQualityDashboard

Programming Languages

Labels

Projects that are alternatives of or similar to DataQualityDashboard

DataQualityDashboard

Introduction

Overview

Features

Technology

System Requirements

Support

License

Development status

Acknowledgements