GitPlanet
Projects
Users
Categories
Languages
About
All Categories
→
No Category
→ dataquality
Top 5 dataquality open source projects
Deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
✭ 2,020
scala
spark
unit-testing
dataquality
Great expectations
Always know what to expect from your data.
✭ 5,808
python
Jupyter Notebook
Jinja
HTML
javascript
CSS
data-science
pipeline
data-engineering
eda
exploratory-data-analysis
data-quality
data-profiling
datacleaner
exploratory-analysis
cleandata
dataquality
datacleaning
mlops
pipeline-tests
pipeline-testing
dataunittest
data-unit-tests
exploratorydataanalysis
pipeline-debt
data-profilers
re-data
re_data - fix data issues before your users & CEO would discover them 😊
✭ 955
HTML
typescript
python
data-analysis
dbt
data-quality-checks
data-quality
dataquality
open-source-tooling
data-monitoring
data-quality-monitoring
data-testing
dbt-packages
data-observability
data-reliability
DQCS
数据质量控制系统
✭ 34
java
HTML
scala
CSS
javascript
shell
data
database
etl
dataquality
zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
✭ 655
java
HTML
python
scala
shell
Batchfile
identity
data-science
identity-resolution
spark
etl
analytics
dedupe
entity-resolution
data-transformation
ml
fuzzy-matching
deduplication
datalake
masterdata
dataengineering
fuzzymatch
dataquality
analytics-engineering
data-transformations
modern-data-stack
1-5
of
5
dataquality projects