All Projects → OHDSI → DataQualityDashboard

OHDSI / DataQualityDashboard

Licence: Apache-2.0 license
A tool to help improve data quality standards in observational data science.

Programming Languages

javascript
184084 projects - #8 most used programming language
r
7636 projects

Projects that are alternatives of or similar to DataQualityDashboard

Great expectations
Always know what to expect from your data.
Stars: ✭ 5,808 (+9267.74%)
Mutual labels:  data-quality
penguin-datalayer-collect
A data layer quality monitoring and validation module, this solution is part of the Raft Suite ecosystem.
Stars: ✭ 19 (-69.35%)
Mutual labels:  data-quality
check-engine
Data validation library for PySpark 3.0.0
Stars: ✭ 29 (-53.23%)
Mutual labels:  data-quality
Applied Ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+28648.39%)
Mutual labels:  data-quality
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-6.45%)
Mutual labels:  data-quality
great expectations action
A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
Stars: ✭ 66 (+6.45%)
Mutual labels:  data-quality
qamd
QAMyData, a data quality assurance tool for SPSS, STATA, SAS and CSV files.
Stars: ✭ 16 (-74.19%)
Mutual labels:  data-quality
hooqu
hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python
Stars: ✭ 17 (-72.58%)
Mutual labels:  data-quality
contessa
Easy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-72.58%)
Mutual labels:  data-quality
osm-data-classification
Migrated to: https://gitlab.com/Oslandia/osm-data-classification
Stars: ✭ 23 (-62.9%)
Mutual labels:  data-quality
hive compared bq
hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
Stars: ✭ 27 (-56.45%)
Mutual labels:  data-quality
NBi
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…
Stars: ✭ 102 (+64.52%)
Mutual labels:  data-quality
leila
Librería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Stars: ✭ 56 (-9.68%)
Mutual labels:  data-quality
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+13333.87%)
Mutual labels:  data-quality
datatile
A library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+575.81%)
Mutual labels:  data-quality
Django-Data-quality-system
数据治理、数据质量检核/监控平台(Django+jQuery+MySQL)
Stars: ✭ 143 (+130.65%)
Mutual labels:  data-quality
dqlab-career-track
A collection of scripts written to complete DQLab Data Analyst Career Track 📊
Stars: ✭ 53 (-14.52%)
Mutual labels:  data-quality
TracIn
Implementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)
Stars: ✭ 165 (+166.13%)
Mutual labels:  data-quality
Data-Quality-Analysis
The PEDSnet Data Quality Assessment Toolkit (OMOP CDM)
Stars: ✭ 19 (-69.35%)
Mutual labels:  data-quality
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+132.26%)
Mutual labels:  data-quality

DataQualityDashboard

Build Status codecov.io

The goal of the Data Quality Dashboard (DQD) project is to design and develop an open-source tool to expose and evaluate observational data quality.

Introduction

This package will run a series of data quality checks against an OMOP CDM instance (currently supports v5.3.1 and v5.2.2). It systematically runs the checks, evaluates the checks against some pre-specified threshold, and then communicates what was done in a transparent and easily understandable way.

Overview

The quality checks were organized according to the Kahn Framework1 which uses a system of categories and contexts that represent strategies for assessing data quality. For an introduction to the kahn framework please click here.

Using this framework, the Data Quality Dashboard takes a systematic-based approach to running data quality checks. Instead of writing thousands of individual checks, we use “data quality check types”. These “check types” are more general, parameterized data quality checks into which OMOP tables, fields, and concepts can be substituted to represent a singular data quality idea. For example, one check type might be written as

The number and percent of records with a value in the cdmFieldName field of the cdmTableName table less than plausibleValueLow.

This would be considered an atemporal plausibility verification check because we are looking for implausibly low values in some field based on internal knowledge. We can use this check type to substitute in values for cdmFieldName, cdmTableName, and plausibleValueLow to create a unique data quality check. If we apply it to PERSON.YEAR_OF_BIRTH here is how that might look:

The number and percent of records with a value in the year_of_birth field of the PERSON table less than 1850.

And, since it is parameterized, we can similarly apply it to DRUG_EXPOSURE.days_supply:

The number and percent of records with a value in the days_supply field of the DRUG_EXPOSURE table less than 0.

Version 1 of the tool includes 20 different check types organized into Kahn contexts and categories. Additionally, each data quality check type is considered either a table check, field check, or concept-level check. Table-level checks are those evaluating the table at a high-level without reference to individual fields, or those that span multiple event tables. These include checks making sure required tables are present or that at least some of the people in the PERSON table have records in the event tables. Field-level checks are those related to specific fields in a table. The majority of the check types in version 1 are field-level checks. These include checks evaluating primary key relationship and those investigating if the concepts in a field conform to the specified domain. Concept-level checks are related to individual concepts. These include checks looking for gender-specific concepts in persons of the wrong gender and plausible values for measurement-unit pairs. For a detailed description and definition of each check type please click here.

After systematically applying the 20 check types to an OMOP CDM version approximately 3,351 individual data quality checks are resolved, run against the database, and evaluated based on a pre-specified threshold. The R package then creates a json object that is read into an RShiny application to view the results.

Features

  • Utilizes configurable data check thresholds
  • Analyzes data in the OMOP Common Data Model format for all data checks
  • Produces a set of data check results with supplemental investigation assets.

Technology

DataQualityDashboard is an R package

System Requirements

Requires R (version 3.2.2 or higher). Requires DatabaseConnector and SqlRender.

Support

License

DataQualityDashboard is licensed under Apache License 2.0

Development status

V1.0 ready for use.

Acknowledgements

This project is supported in part through the National Science Foundation grant IIS 1251151.

1 Kahn, M.G., et al., A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC), 2016. 4(1): p. 1244.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].