All Projects → datacleaner → Datacleaner

datacleaner / Datacleaner

Licence: lgpl-3.0
The premier open source Data Quality solution

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Datacleaner

Awesome Business Intelligence
Actively curated list of awesome BI tools. PRs welcome!
Stars: ✭ 1,157 (+195.91%)
Mutual labels:  data-science, data-analysis, etl, database
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+1158.06%)
Mutual labels:  data-science, data-analysis, etl, data
Etl with python
ETL with Python - Taught at DWH course 2017 (TAU)
Stars: ✭ 68 (-82.61%)
Mutual labels:  data-science, etl, database
Gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Stars: ✭ 1,229 (+214.32%)
Mutual labels:  data-science, data-analysis, data
Flyte
Accelerate your ML and Data workflows to production. Flyte is a production grade orchestration system for your Data and ML workloads. It has been battle tested at Lyft, Spotify, freenome and others and truly open-source.
Stars: ✭ 1,242 (+217.65%)
Mutual labels:  data-science, data-analysis, data
Datacomparer
dataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.
Stars: ✭ 58 (-85.17%)
Mutual labels:  data-science, data-analysis, data
Graphia
A visualisation tool for the creation and analysis of graphs
Stars: ✭ 67 (-82.86%)
Mutual labels:  data-science, data-analysis, data
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-79.8%)
Mutual labels:  data-science, data-analysis, etl
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (-56.27%)
Mutual labels:  data-science, data-analysis, data
Awesome Bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 10,478 (+2579.8%)
Mutual labels:  data-science, database, data
Openrefine
OpenRefine is a free, open source power tool for working with messy data and improving it
Stars: ✭ 8,531 (+2081.84%)
Mutual labels:  data-science, data-analysis, data
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-30.18%)
Mutual labels:  data-science, data-analysis, data
Pycm
Multi-class confusion matrix library in Python
Stars: ✭ 1,076 (+175.19%)
Mutual labels:  data-science, data-analysis, data
Akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Stars: ✭ 4,334 (+1008.44%)
Mutual labels:  data-science, data-analysis, data
Skdata
Python tools for data analysis
Stars: ✭ 16 (-95.91%)
Mutual labels:  data-science, data-analysis, data
Knowledge Repo
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Stars: ✭ 4,956 (+1167.52%)
Mutual labels:  data-science, data-analysis, data
Reddit Detective
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
Stars: ✭ 129 (-67.01%)
Mutual labels:  etl, database, data
Metabase
The simplest, fastest way to get business intelligence and analytics to everyone in your company 😋
Stars: ✭ 26,803 (+6754.99%)
Mutual labels:  data-analysis, database, data
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-72.63%)
Mutual labels:  data-science, data-analysis, database
Elastic
R client for the Elasticsearch HTTP API
Stars: ✭ 227 (-41.94%)
Mutual labels:  data-science, etl, database

DataCleaner

Build Status: Linux Gitter chat

DataCleaner logo

The premier Open Source Data Quality solution.

DataCleaner is a Data Quality toolkit that allows you to profile, correct and enrich your data. People use it for ad-hoc analysis, recurring cleansing as well as a swiss-army knife in matching and Master Data Management solutions.

Where to go for end-user information?

Please visit the DataCleaner community website https://datacleaner.github.io for downloads, news, documentation etc.

Visit our Gitter chat channel https://gitter.im/datacleaner/community for asking questions or discussions.

GitHub markdown pages and issues are used for developers and technical aspects only.

Module structure

The main application modules are:

  • api - The public API of DataCleaner. Mostly interfaces and annotations that you should use to build your own extensions.
  • resources - Static resources in DataCleaner
  • oss-branding - Icons and colors
  • testware - Useful classes for unit testing of DataCleaner and extension code.
  • engine
    • core - The core engine piece which allows execution of jobs and components as per the API.
    • xml-config - Contains utilities for reading and writing job files and configuration files of DataCleaner.
    • env - Different/alternative environments that DataCleaner can run in, for instance Apache Spark or webapp-cluster
  • components
    • ... - many sub modules containing built-in as well as additional components/extensions to use with DataCleaner.
    • standard-components - a container-project that dependends on all components that are normally bundled in DataCleaner community edition.
  • desktop
    • api - The public API for the DataCleaner desktop application.
    • ui - The Swing-based user interface for desktop users
  • monitor
    • api - the API classes and interfaces of DataCleaner monitor

Code style and formatting

In the root of the project you can find 'Formatter-[IDE].xml' files which enable you to import the code formatting rules of the project into your IDE.

Continuous Integration

There's a public build of DataCleaner that can be found on Travis CI:

https://travis-ci.org/datacleaner/DataCleaner

License

Licensed under the Lesser General Public License, see http://www.gnu.org/licenses/lgpl.txt

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].