All Projects → psebenick → data-profiling

psebenick / data-profiling

Licence: other
a set of scripts to pull meta data and data profiling metrics from relational database systems

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to data-profiling

DataX-src
DataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-63.16%)
Mutual labels:  hive, oracle, sqlserver
Pyetl
python ETL framework
Stars: ✭ 33 (-42.11%)
Mutual labels:  hive, oracle, sqlserver
Addax
Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+978.95%)
Mutual labels:  hive, oracle, sqlserver
Datax
DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (+103.51%)
Mutual labels:  hive, oracle, sqlserver
Apijson
🚀 零代码、热更新、全自动 ORM 库,后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构。 🚀 A JSON Transmission Protocol and an ORM Library for automatically providing APIs and Docs.
Stars: ✭ 12,559 (+21933.33%)
Mutual labels:  oracle, sqlserver
Servicestack.ormlite
Fast, Simple, Typed ORM for .NET
Stars: ✭ 1,532 (+2587.72%)
Mutual labels:  oracle, sqlserver
Qxorm
QxOrm library - C++ Qt ORM (Object Relational Mapping) and ODM (Object Document Mapper) library - Official repository
Stars: ✭ 176 (+208.77%)
Mutual labels:  oracle, sqlserver
Liquibase
Main Liquibase Source
Stars: ✭ 2,910 (+5005.26%)
Mutual labels:  oracle, sqlserver
Ezsql
PHP class to make interacting with a database ridiculusly easy
Stars: ✭ 804 (+1310.53%)
Mutual labels:  oracle, sqlserver
Sharding Method
分表分库的新思路——服务层Sharding框架,全SQL、全数据库兼容,ACID特性与原生数据库一致,能实现RR级别读写分离,无SQL解析性能更高
Stars: ✭ 188 (+229.82%)
Mutual labels:  oracle, sqlserver
Freesql
🦄 .NET orm, Mysql orm, Postgresql orm, SqlServer orm, Oracle orm, Sqlite orm, Firebird orm, 达梦 orm, 人大金仓 orm, 神通 orm, 翰高 orm, 南大通用 orm, Click house orm, MsAccess orm.
Stars: ✭ 3,077 (+5298.25%)
Mutual labels:  oracle, sqlserver
Sqlfaker
轻量级、易拓展的数据库智能填充Java开源库
Stars: ✭ 109 (+91.23%)
Mutual labels:  oracle, sqlserver
Csv2db
The CSV to database command line loader
Stars: ✭ 102 (+78.95%)
Mutual labels:  oracle, sqlserver
Kangaroo
SQL client and admin tool for popular databases
Stars: ✭ 127 (+122.81%)
Mutual labels:  oracle, sqlserver
Ebean
Ebean ORM
Stars: ✭ 1,172 (+1956.14%)
Mutual labels:  oracle, sqlserver
Koolreport
This is an Open Source PHP Reporting Framework which you can use to write perfect data reports or to construct awesome dashboards using PHP
Stars: ✭ 204 (+257.89%)
Mutual labels:  oracle, sqlserver
Zxw.framework.netcore
基于EF Core的Code First模式的DotNetCore快速开发框架,其中包括DBContext、IOC组件autofac和AspectCore.Injector、代码生成器(也支持DB First)、基于AspectCore的memcache和Redis缓存组件,以及基于ICanPay的支付库和一些日常用的方法和扩展,比如批量插入、更新、删除以及触发器支持,当然还有demo。欢迎提交各种建议、意见和pr~
Stars: ✭ 691 (+1112.28%)
Mutual labels:  oracle, sqlserver
Smartsql
SmartSql = MyBatis in C# + .NET Core+ Cache(Memory | Redis) + R/W Splitting + PropertyChangedTrack +Dynamic Repository + InvokeSync + Diagnostics
Stars: ✭ 775 (+1259.65%)
Mutual labels:  oracle, sqlserver
Datafaker
Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
Stars: ✭ 327 (+473.68%)
Mutual labels:  hive, oracle
Maha
A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (+77.19%)
Mutual labels:  hive, oracle

data-profiling

a set of scripts to pull meta data and data profiling metrics from relational database systems

The collection of scripts and SQL-code which can be tailored to collect specific information about tables and columns within databases. It facilitates the bulk and rapid collection of high level and common metadata and provides a great starting point for identifiying and inventory of your database objects.

The scripts are intended to be used within your own particular client user interface and assumes you are able to connect to the database from which you want to collect data. It is also up to you to dump the results out to whatever output meduim you want. Generally by saving the results as a spreadsheet, csv, or cutting and pasting the results from your user interface. This allows the greatest degree of flexibility.

This repostiory is not meant to provide very deep data profiling capabilities, other tools such as Informatica, Talend, InfoShere and others can do that much better.

We get this most of the meta data from the internal data dictionary of the database system. Some meta data can only be obtained by querying the the tables themselves. Since performance can often be a problem when profiling large tables, we don't try to do this in bulk for an entire database. This is where the more expensive and sophisticated tools are most useful.

Data profiling is a process

Data Profiling generally consists of a series of steps that dig deeper and deeper into the details of the data sets. The high level steps addressed by this repository include:

  1. Inventory of databases (schemas)
  2. Inventory and metadata of tables within the databases
  3. Inventory and metadata of columns with the tables
  4. For each column we can then dig into the details ( i.e. frequency distribution or list of values for columns)

Databases Platforms

Since each database platform has it's own propiatary data dictionary, each platform has it's owns set of scripts. The following databases have scripts in this repository:

* Oracle
* SQLServer
* MySQL
* Sybase IQ
* Netezza
* Hive
* AWS Redshift
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].