Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
qamdQAMyData, a data quality assurance tool for SPSS, STATA, SAS and CSV files.
re-datare_data - fix data issues before your users & CEO would discover them 😊
TracInImplementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)
hooquhooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python
datatileA library for managing, validating, summarizing, and visualizing data.
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
leilaLibrería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
great expectations actionA GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
dqlab-career-trackA collection of scripts written to complete DQLab Data Analyst Career Track 📊
penguin-datalayer-collectA data layer quality monitoring and validation module, this solution is part of the Raft Suite ecosystem.
contessaEasy way to define, execute and store quality rules for your data.
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
NBiNBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…
hive compared bqhive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.