All Projects → janmotl → linkifier

janmotl / linkifier

Licence: BSD-2-Clause license
Database reverse engineering

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to linkifier

Tadpolefordbtools
Stars: ✭ 523 (+1534.38%)
Mutual labels:  oracle, mssql
Directus
Open-Source Data Platform 🐰 — Directus wraps any SQL database with a real-time GraphQL+REST API and an intuitive app for non-technical users.
Stars: ✭ 13,190 (+41118.75%)
Mutual labels:  oracle, mssql
Sqlinjectionwiki
A wiki focusing on aggregating and documenting various SQL injection methods
Stars: ✭ 623 (+1846.88%)
Mutual labels:  oracle, mssql
sqb
Extensible, multi-dialect SQL query builder and Database connection framework for NodeJS
Stars: ✭ 14 (-56.25%)
Mutual labels:  oracle, mssql
Schemazen
Script and create SQL Server objects quickly
Stars: ✭ 273 (+753.13%)
Mutual labels:  schema, mssql
Sqlinjectionwiki
一个专注于聚合和记录各种SQL注入方法的wiki
Stars: ✭ 402 (+1156.25%)
Mutual labels:  oracle, mssql
Experdb Db2pg
eXperDB-DB2PG is a data migration solution that transfers data extracted from various DBMSs to eXperDB or PostgreSQL. Currently, Oracle and Oracle Spatial, MySQL, SQL Server(MS-SQL) and Sybase data can be transferred.
Stars: ✭ 24 (-25%)
Mutual labels:  oracle, mssql
eReports-open-source
Sistema de envio e agendamento de relatórios
Stars: ✭ 30 (-6.25%)
Mutual labels:  oracle, mssql
transferdb
TransferDB 支持异构数据库 schema 转换、全量数据导出导入以及增量数据同步功能( Oracle 数据库 -> MySQL/TiDB 数据库)
Stars: ✭ 30 (-6.25%)
Mutual labels:  schema, oracle
Linq2db
Linq to database provider.
Stars: ✭ 2,211 (+6809.38%)
Mutual labels:  oracle, mssql
json-sql-builder2
Level Up Your SQL-Queries
Stars: ✭ 59 (+84.38%)
Mutual labels:  oracle, mssql
exodus
Migration tools for Tabular Data to Oracle JSON/Tabular Data
Stars: ✭ 19 (-40.62%)
Mutual labels:  schema, oracle
database connections
⚙️Demonstration code to connect R on MacOS to various database flavors.
Stars: ✭ 18 (-43.75%)
Mutual labels:  oracle, mssql
sqle
SQLE is a SQL audit platform | SQLE 是一个支持多场景,原生支持 MySQL 审核且数据库类型可扩展的 SQL 审核工具
Stars: ✭ 731 (+2184.38%)
Mutual labels:  oracle, mssql
dbclient
데이터배이스 관리 / 자동 메일링 / Admin 자동화 / Database IDE Tool. SQL Development Helper. Support DBMS Oracle/Mysql/MS-SQL
Stars: ✭ 35 (+9.38%)
Mutual labels:  oracle, mssql
K8cscan
K8Cscan大型内网渗透自定义插件化扫描神器,包含信息收集、网络资产、漏洞扫描、密码爆破、漏洞利用,程序采用多线程批量扫描大型内网多个IP段C段主机,目前插件包含: C段旁注扫描、子域名扫描、Ftp密码爆破、Mysql密码爆破、Oracle密码爆破、MSSQL密码爆破、Windows/Linux系统密码爆破、存活主机扫描、端口扫描、Web信息探测、操作系统版本探测、Cisco思科设备扫描等,支持调用任意外部程序或脚本,支持Cobalt Strike联动
Stars: ✭ 693 (+2065.63%)
Mutual labels:  oracle, mssql
fapro
Fake Protocol Server
Stars: ✭ 1,338 (+4081.25%)
Mutual labels:  oracle, mssql
Rom Sql
SQL support for rom-rb
Stars: ✭ 169 (+428.13%)
Mutual labels:  oracle, mssql
E Commerce Db
Database schema for e-commerce (webstores) sites.
Stars: ✭ 245 (+665.63%)
Mutual labels:  schema, oracle
laravel-adminer
Adminer database management tool for your Laravel application.
Stars: ✭ 45 (+40.63%)
Mutual labels:  oracle, mssql

Estimate Primary & Foreign Key Constraints in a Database

Do you need to import CSV files into a database but no one gave you the entity–relationship model to set up the primary and foreign keys?

Say no more.

screenshot

Usage

  1. Download the latest release
  2. Execute the jar file (requieres Oracle JRE 8 because of JavaFX)
  3. Connect to a database (Microsoft SQL Server, MySQL, Oracle or PostgreSQL)
  4. Estimate the primary and foreign key constraints
  5. Export the estimates (multiple formats are supported including SQL alter queries and graphml for ER-diagram visualization)

Algorithm

The algorithm works on the metadata about tables and columns, which are accessible over JDBC, and column and table statistics, which the database uses for the execution plan optimization. That means the algorithm is blazing fast as it does not look at the actual data.

Primary Key

Primary keys are identified by:

  1. Column position in the table (PKs are commonly at the beginning)
  2. Data type (e.g. integers are preferred over doubles)
  3. Presence of a keyword like "id" in the column name
  4. Similarity of the column name with the table name
  5. Ratio of nulls in the column from the column statistics
  6. Average column width in bytes from the column statistics (PKs are commonly short)

Once these features are collected, they are passed to logistic regression to estimate the probability that the column is in a primary key (compound PKs are supported). Since each table can have at most a single PK, the column with the highest probability in the table is declared to be the primary key of the table. If the estimated count of unique values in the estimated PK is smaller than the count of the rows in the table, it is a sign of a compound PK. In that case, columns are incrementally added into the PK by their descending probability of being in the PK, until the estimated PK unique row count ≥ estimated table row count. The estimated PK unique row count is optimistic and is defined as a product of the individual unique row counts in the PK. For example, if we have a table with 100 rows and the best candidate for a PK has only 11 unique rows, we will have to create a compound PK. The second best candidate for a PK has 13 unique values. Since 11*13 ≥ 100, we stop here and return a compound PK with these 2 columns.

Foreign keys

Foreign keys are identified by:

  1. Known PKs (FK should reference a PK)
  2. Data types (FK should reference a PK of a compatible data type)
  3. Data type properties (e.g. data type sizes should agree)
  4. Similarity of the FK name with the PK name (should be high)
  5. Similarity of the FK name with the PK table name (should be high)
  6. Similarity of the FK name with the FK table name (should be low)
  7. Average column width of the FK should be similar to the average PK width
  8. Minimum and maximum values of the FK is in the range of the PK
  9. Minimum and maximum values of the FK cover a significant range of the PK values

Once again, probabilities are estimated with logistic regression. And the most likely FK constraints are returned (a compound FK is returned if the PK is compound).

How to improve the quality of the estimates

  1. Calculate (sample) statistics on all tables in the schema. Example queries for calculating the statistics for a table "molecule" in "mutagenesis" schema:
    • Microsoft SQL Server: CREATE STATISTICS mutagenesis.molecule
    • MySQL: ANALYZE TABLE mutagenesis.molecule
    • Oracle: EXEC DBMS_STATS.gather_table_stats('mutagenesis', 'molecule')
    • PostgreSQL: ANALYZE mutagenesis.molecule
  2. Set primary keys in the schema. Once Linkifier knows for sure what are the PKs, the accuracy of the foreign key constraint estimates increases.
  3. Set known foreign key constraints in the schema. Linkifier will then attempt to find only remaining missing foreign key constraints.
  4. Limit the search space with "Table blacklist regex" option to a subset of tables that are actually interesting for you.
  5. Inspect "Decision justification for FK" export, which contains the estimated probabilites for the top 2000 foreign key candidates (the 3rd column from the right).

Limitations

If the schema quality is extremely low (e.g. all columns are typed as text and have irrelevant names), the PK and particularly FK estimates are going to be off. First, set correct data types and names to the columns. Then rerun Linkifier.

Known issues

If you are using MySQL and get Access to data dictionary table 'mysql.table_stats' is rejected then it is because MySQL, contrary to MariaDB, prevents access to internal tables. To be able to run Linkifier, start the db in the debug mode.

If you have problems to connect to MSSQL, make sure login with username-password combination is permitted and TCP/IP protocol is permitted. If you want to use Windows Authentication instead of username-password combination, a correct version of sqljdbc_auth.dll (32/64bit) must be in the Java path. Possibly the easisest solution is to copy sqljdbc_auth.dll from source directory lib\mssql\auth\ into bin directory in Java (e.g. C:\Program Files\Java\jre8\bin\). Alternatively, start Linkifier from the command line with arguments like java -Djava.library.path=C:\path\to\sqljdbc_auth_directory -jar linkifier.jar.

If you have any question or suggestion, let me know.

How to cite

Foreign Key Constraint Identification in Relational Databases, Jan Motl and Pavel Kordík, 2017, ITAT 2017 Proceedings.

Acknowledgement

I would like to thank Aleš Fišer, Oliver Kerul’-Kmec, Jan Kukačka, Jiří Kukačka, Manuel Muñoz, George Papaloukopoulos and Batal Thibaut for their help. The code is using Simmetrics for text similarity calculations.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].