re-datare_data - fix data issues before your users & CEO would discover them 😊
Stars: ✭ 955 (+567.83%)
datatileA library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+193.01%)
hooquhooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python
Stars: ✭ 17 (-88.11%)
penguin-datalayer-collectA data layer quality monitoring and validation module, this solution is part of the Raft Suite ecosystem.
Stars: ✭ 19 (-86.71%)
NBiNBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…
Stars: ✭ 102 (-28.67%)
qamdQAMyData, a data quality assurance tool for SPSS, STATA, SAS and CSV files.
Stars: ✭ 16 (-88.81%)
DataQualityDashboardA tool to help improve data quality standards in observational data science.
Stars: ✭ 62 (-56.64%)
TracInImplementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)
Stars: ✭ 165 (+15.38%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (-79.72%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+0.7%)
pyradPython Radar Data Processing
Stars: ✭ 42 (-70.63%)
leilaLibrería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Stars: ✭ 56 (-60.84%)
great expectations actionA GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
Stars: ✭ 66 (-53.85%)
dqlab-career-trackA collection of scripts written to complete DQLab Data Analyst Career Track 📊
Stars: ✭ 53 (-62.94%)
contessaEasy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-88.11%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-59.44%)
hive compared bqhive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
Stars: ✭ 27 (-81.12%)
Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+12364.34%)
Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+5724.48%)