datatileA library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+2105.26%)
NBiNBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…
Stars: ✭ 102 (+436.84%)
re-datare_data - fix data issues before your users & CEO would discover them 😊
Stars: ✭ 955 (+4926.32%)
hooquhooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python
Stars: ✭ 17 (-10.53%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (+52.63%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+657.89%)
leilaLibrería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Stars: ✭ 56 (+194.74%)
great expectations actionA GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
Stars: ✭ 66 (+247.37%)
dqlab-career-trackA collection of scripts written to complete DQLab Data Analyst Career Track 📊
Stars: ✭ 53 (+178.95%)
OMOP2OBOOMOP2OBO: A Python Library for mapping OMOP standardized clinical terminologies to Open Biomedical Ontologies
Stars: ✭ 55 (+189.47%)
penguin-datalayer-collectA data layer quality monitoring and validation module, this solution is part of the Raft Suite ecosystem.
Stars: ✭ 19 (+0%)
contessaEasy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-10.53%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+205.26%)
hive compared bqhive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
Stars: ✭ 27 (+42.11%)
mimic-omopMapping the MIMIC-III database to the OMOP schema
Stars: ✭ 100 (+426.32%)
Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+93710.53%)
Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+43736.84%)
qamdQAMyData, a data quality assurance tool for SPSS, STATA, SAS and CSV files.
Stars: ✭ 16 (-15.79%)
DataQualityDashboardA tool to help improve data quality standards in observational data science.
Stars: ✭ 62 (+226.32%)
TracInImplementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)
Stars: ✭ 165 (+768.42%)