All Projects → PEDSnet → Data-Quality-Analysis

PEDSnet / Data-Quality-Analysis

Licence: BSD-2-Clause license
The PEDSnet Data Quality Assessment Toolkit (OMOP CDM)

Programming Languages

r
7636 projects
go
31211 projects - #10 most used programming language
python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to Data-Quality-Analysis

re-data
re_data - fix data issues before your users & CEO would discover them 😊
Stars: ✭ 955 (+4926.32%)
Mutual labels:  data-quality-checks, data-quality
NBi
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…
Stars: ✭ 102 (+436.84%)
Mutual labels:  data-quality-checks, data-quality
hooqu
hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python
Stars: ✭ 17 (-10.53%)
Mutual labels:  data-quality-checks, data-quality
Django-Data-quality-system
数据治理、数据质量检核/监控平台(Django+jQuery+MySQL)
Stars: ✭ 143 (+652.63%)
Mutual labels:  data-quality-checks, data-quality
datatile
A library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+2105.26%)
Mutual labels:  data-quality-checks, data-quality
OMOP2OBO
OMOP2OBO: A Python Library for mapping OMOP standardized clinical terminologies to Open Biomedical Ontologies
Stars: ✭ 55 (+189.47%)
Mutual labels:  omop
penguin-datalayer-collect
A data layer quality monitoring and validation module, this solution is part of the Raft Suite ecosystem.
Stars: ✭ 19 (+0%)
Mutual labels:  data-quality
Applied Ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+93710.53%)
Mutual labels:  data-quality
Great expectations
Always know what to expect from your data.
Stars: ✭ 5,808 (+30468.42%)
Mutual labels:  data-quality
check-engine
Data validation library for PySpark 3.0.0
Stars: ✭ 29 (+52.63%)
Mutual labels:  data-quality
leila
Librería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Stars: ✭ 56 (+194.74%)
Mutual labels:  data-quality
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+205.26%)
Mutual labels:  data-quality
great expectations action
A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
Stars: ✭ 66 (+247.37%)
Mutual labels:  data-quality
ohsome-quality-analyst
Data quality estimations for OpenStreetMap
Stars: ✭ 28 (+47.37%)
Mutual labels:  data-quality
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+657.89%)
Mutual labels:  data-quality
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+43736.84%)
Mutual labels:  data-quality
contessa
Easy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-10.53%)
Mutual labels:  data-quality
hive compared bq
hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
Stars: ✭ 27 (+42.11%)
Mutual labels:  data-quality
osm-data-classification
Migrated to: https://gitlab.com/Oslandia/osm-data-classification
Stars: ✭ 23 (+21.05%)
Mutual labels:  data-quality
dqlab-career-track
A collection of scripts written to complete DQLab Data Analyst Career Track 📊
Stars: ✭ 53 (+178.95%)
Mutual labels:  data-quality

Data Quality Assessment in PEDSnet

Additional Resources

Introduction to the Tool

A summary of how to execute the tool for an initial run can be found here: (Initial Run)

Running for a PEDSnet Site

Instructions for how to execute the toolkit for a PEDSnet data submission can be found here: (PEDSnet Site)

Uploading to the Database

Issues are uploaded at the end of each cycle in their raw form to the database. The script to do this is included in the package here and utlizes the argos package in the standard approach: (Upload Issues)

To upload issues, set the variable to the directory where the resulting issue .csv files were output, specifcy the data version in the variable, and specify the site. Sourcing the script will upload the issues.

Objective

This toolkit has been designed for conducting data quality assessments on clinical datasets modeled using the OMOP common data model. The toolkit includes a wide variety of data quality checks and a GitHub-based issue reporting mechanism. The toolkit is being routinely used by the PEDSnet CDRN.

Contents

  • Data: the data quality catalog of checks, summaries of previous data cycle, and acceptable valuesets for various fields.
  • Doc: documentation and set up instruction for the program
  • Infrastructure: constants and internal helper functions
  • Library: contains data quality checks and utility functions
  • Main: single and multi-variable data quality scripts
  • Resources: configuration file
  • Tools: scripts for GitHub-based feedback generation

Required Downloads

R

R version 3.2.x or above, 64-bit (Comprehensive R Archive Network)

R Packages

install.packages(c("DBI","yaml","ggplot2","RJDBC","devtools","futile.logger","plyr","dplyr",
"dbplyr","lubridate", "tictoc", "testthat", "data.table"))

install.packages("RPostgres")
library(devtools)
install_github("baileych/ohdsi-argos")
  • Minimum Versions Required:
    • R: 3.2
    • DBI: 0.7
    • dplyr: 0.7
    • dbplyr: 1.2
    • readr: 1.1
    • rlang: 0.1.4
    • stringr: 1.2
  • The RPostgres package is not required if PostgreSQL is not the target database type
  • For Oracle users, the ROracle package should be installed

Note: if previously installed, run update.packages() to get the latest version of each library

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].