All Projects → datanymizer → Datanymizer

datanymizer / Datanymizer

Licence: mit
Powerful database anonymizer with flexible rules. Written in Rust.

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to Datanymizer

Pgdiff
Compares the PostgreSQL schema between two databases and generates SQL statements that can be run manually against the second database to make their schemas match.
Stars: ✭ 333 (+126.53%)
Mutual labels:  database, postgresql-database
Binexport
Export disassemblies into Protocol Buffers
Stars: ✭ 586 (+298.64%)
Mutual labels:  database, postgresql-database
Niklick
Rails Versioned API solution template for hipsters! (Ruby, Ruby on Rails, REST API, GraphQL, Docker, RSpec, Devise, Postgress DB)
Stars: ✭ 39 (-73.47%)
Mutual labels:  database, postgresql-database
Js Search
JS Search is an efficient, client-side search library for JavaScript and JSON objects
Stars: ✭ 1,920 (+1206.12%)
Mutual labels:  database
Nyan
Modding API with a typesafe hierarchical key-value database with inheritance and dynamic patching 😺
Stars: ✭ 141 (-4.08%)
Mutual labels:  database
Metamodel
Mirror of Apache Metamodel
Stars: ✭ 143 (-2.72%)
Mutual labels:  database
Polluter
The easiest solution to seed database with Go
Stars: ✭ 146 (-0.68%)
Mutual labels:  database
Laravel Db Profiler
Database Profiler for Laravel Web and Console Applications.
Stars: ✭ 141 (-4.08%)
Mutual labels:  database
Jnosql
Eclipse JNoSQL is a framework which has the goal to help Java developers to create Jakarta EE applications with NoSQL.
Stars: ✭ 145 (-1.36%)
Mutual labels:  database
Influxdata.net
InfluxData TICK stack .net library.
Stars: ✭ 142 (-3.4%)
Mutual labels:  database
Clickhouse Net
Yandex ClickHouse fully managed .NET client
Stars: ✭ 142 (-3.4%)
Mutual labels:  database
R2dbc Pool
Connection Pooling for Reactive Relational Database Connectivity
Stars: ✭ 141 (-4.08%)
Mutual labels:  database
Dapper.fsharp
Lightweight F# extension for StackOverflow Dapper with support for MSSQL, MySQL and PostgreSQL
Stars: ✭ 145 (-1.36%)
Mutual labels:  database
Vaadin On Kotlin
Writing full-stack statically-typed web apps on JVM at its simplest
Stars: ✭ 141 (-4.08%)
Mutual labels:  database
Realm Java
Realm is a mobile database: a replacement for SQLite & ORMs
Stars: ✭ 11,232 (+7540.82%)
Mutual labels:  database
Sqlite Jdbc
SQLite JDBC Driver
Stars: ✭ 1,961 (+1234.01%)
Mutual labels:  database
Statecraft
Manage state with finesse
Stars: ✭ 145 (-1.36%)
Mutual labels:  database
Mongodb For Python Developers
MongoDB for Python developers course handouts from Talk Python Training
Stars: ✭ 141 (-4.08%)
Mutual labels:  database
Sqlingo
💥 A lightweight DSL & ORM which helps you to write SQL in Go.
Stars: ✭ 142 (-3.4%)
Mutual labels:  database
Sequelize Ui
Browser-based GUI for previewing and generating Sequelize project files.
Stars: ✭ 142 (-3.4%)
Mutual labels:  database

[Data]nymizer

datanymizer

Powerful database anonymizer with flexible rules. Written in Rust.

Datanymizer is created & supported by Evrone. What else we develop with Rust.

More information you can find in articles in English and Russian.

How it works

Database -> Dumper (+Faker) -> Dump.sql

You can import or process you dump with supported database without 3rd-party importers.

Datanymizer generates database-native dump.

Installation

There are several ways to install pg_datanymizer. Choose a more convenient option for you.

Pre-compiled binary

# Linux / macOS / Windows (MINGW and etc). Installs it into ./bin/ by default
$ curl -sSfL https://raw.githubusercontent.com/datanymizer/datanymizer/main/cli/pg_datanymizer/install.sh | sh -s

# Or more shorter way
$ curl -sSfL https://git.io/pg_datanymizer | sh -s

# Specify installation directory and version
$ curl -sSfL https://git.io/pg_datanymizer | sh -s -- -b usr/local/bin v0.1.0

# Alpine Linux (wget)
$ wget -q -O - https://git.io/pg_datanymizer | sh -s

Homebrew / Linuxbrew

# Installs the latest stable release
$ brew install datanymizer/tap/pg_datanymizer

# Builds the latest version from the repository
$ brew install --HEAD datanymizer/tap/pg_datanymizer

Docker

$ docker run --rm -v `pwd`:/app -w /app datanymizer/pg_datanymizer

Getting started with CLI dumper

Inspect your database schema, choose fields with sensitive data and create config, based on it.

# config.yml
tables:
  - name: markets
    rules:
      name_translations:
        template:
          format: '{"en": "{{_1}}", "ru": "{{_2}}"}'
          rules:
            - words:
                min: 1
                max: 2
            - words:
                min: 1
                max: 2
  - name: franchisees
    rules:
      operator_mail:
        template:
          format: user-{{_1}}-{{_2}}
          rules:
            - random_num: {}
            - email:
                kind: Safe
      operator_name:
        first_name: {}
      operator_phone:
        phone:
          format: +###########
      name_translations:
        template:
          format: '{"en": "{{_1}}", "ru": "{{_2}}"}'
          rules:
            - words:
                min: 2
                max: 3
            - words:
                min: 2
                max: 3
  - name: users
    rules:
      first_name:
        first_name: {}
      last_name:
        last_name: {}
  - name: customers
    rules:
      email:
        template:
          format: user-{{_1}}-{{_2}}
          rules:
            - random_num: {}
            - email:
                kind: Safe
                uniq:  
                  required: true
                  try_count: 5
      phone:
        phone:
          format: +7##########
          uniq: true
      city:
        city: {}
      age:
        random_num:
          min: 10
          max: 99
      first_name:
        first_name: {}
      last_name:
        last_name: {}
      birth_date:
        datetime:
          from: 1990-01-01T00:00:00+00:00
          to: 2010-12-31T00:00:00+00:00

And then start to make dump from your database instance:

pg_datanymizer -f /tmp/dump.sql -c ./config.yml postgres://postgres:[email protected]/test_database

Uniqueness It creates new dump file /tmp/dump.sql with native SQL dump for Postgresql database. You can import fake data from this dump into new Postgresql database with command:

psql -Upostgres -d new_database < /tmp/dump.sql

Dumper can stream dump to STDOUT like pg_dump and you can use it in other pipelines:

pg_datanymizer -c ./config.yml postgres://postgres:[email protected]/test_database > /tmp/dump.sql

Additional options

Tables filter

You can specify which tables you choose or ignore for making dump.

For dumping only public.markets and public.users data.

# config.yml
#...
filter:
  only:
    - public.markets
    - public.users

For ignoring those tables and dump data from others.

# config.yml
#...
filter:
  except:
    - public.markets
    - public.users

You can also specify data and schema filters separately.

This is equivalent to the previous example.

# config.yml
#...
filter:
  data:
    except:
      - public.markets
      - public.users

For skipping schema and data from other tables.

# config.yml
#...
filter:
  schema:
    only:
      - public.markets
      - public.users

For skipping schema for markets table and dumping data only from users table.

# config.yml
#...
filter:
  data:
    only:
      - public.users
  schema:
    except:
      - public.markets

Global variables

You can specify global variables available from any template rule.

# config.yml
tables:
  users:
    bio:
      template:
        format: "User bio is {{var_a}}"
    age:
      template:
        format: {{_0 * global_multiplicator}}
#...
globals:
  var_a: Global variable 1
  global_multiplicator: 6

Available rules

Rule Description
email Emails with different options
ip IP addresses. Supports IPv4 and IPv6
words Lorem words with different length
first_name First name generator
last_name Last name generator
city City names generator
phone Generate random phone with different format
pipeline Use pipeline to generate more complicated values
capitalize Like filter, it capitalizes input value
template Template engine for generate random text with included rules
digit Random digit (in range 0..9)
random_number Random number with min and max options
password Password with different
length options (support max and min options)
datetime Make DateTime strings with options (from and to)
more than 70 rules in total...

Uniqueness

You can specify that result values must be unique (they are not unique by default). You can use short or full syntax.

Short:

uniq: true

Full:

uniq:
  required: true
  try_count: 5

Uniqueness is ensured by re-generating values when they are same. You can customize the number of attempts with try_count (this is an optional field, the default number of tries depends on the rule).

Currently, uniqueness is supported by: email, ip, phone, random_number.

Locales

You can specify the locale for individual rules:

first_name:
  locale: RU

The default locale is EN but you can specify a different default locale:

tables:
  # ........  
default:
  locale: RU

We also support ZH_TW (traditional chinese) and RU (translation in progress).

Referencing row values from templates

You can reference values of other row fields in templates. Use prev for original values and final - for anonymized:

tables:
  - name: some_table
    # You must specify the order of rule execution when using `final`
    rule_order:
      - greeting
      - options
    rules:
      first_name:
        first_name: {}
      greeting:
        template:
          # Keeping the first name, but anonymizing the last name   
          format: "Hello, {{ prev.first_name }} {{ final.last_name }}!"
      options:
        template:
          # Using the anonymized value again   
          format: "{greeting: \"{{ final.greeting }}\"}"

You must specify the order of rule execution when using final with rule_order. All rules not listed will be placed at the beginning (i.e. you must list only rules with final).

Supported databases

  • [x] Postgresql
  • [ ] MySQL or MariaDB (TODO)

Sponsors

Sponsored by Evrone

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].