TracInImplementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)
Stars: ✭ 165 (+166.13%)
hooquhooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python
Stars: ✭ 17 (-72.58%)
datatileA library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+575.81%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (-53.23%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+132.26%)
leilaLibrería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Stars: ✭ 56 (-9.68%)
great expectations actionA GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
Stars: ✭ 66 (+6.45%)
dqlab-career-trackA collection of scripts written to complete DQLab Data Analyst Career Track 📊
Stars: ✭ 53 (-14.52%)
penguin-datalayer-collectA data layer quality monitoring and validation module, this solution is part of the Raft Suite ecosystem.
Stars: ✭ 19 (-69.35%)
contessaEasy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-72.58%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-6.45%)
NBiNBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…
Stars: ✭ 102 (+64.52%)
hive compared bqhive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
Stars: ✭ 27 (-56.45%)
Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+28648.39%)
Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+13333.87%)
qamdQAMyData, a data quality assurance tool for SPSS, STATA, SAS and CSV files.
Stars: ✭ 16 (-74.19%)
re-datare_data - fix data issues before your users & CEO would discover them 😊
Stars: ✭ 955 (+1440.32%)