All Projects → asgeirrr → pgantomizer

asgeirrr / pgantomizer

Licence: BSD-3-Clause license
Anonymize data in your PostgreSQL dabatase with ease

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pgantomizer

myanon
A mysqldump anonymizer
Stars: ✭ 24 (-74.74%)
Mutual labels:  dump, anonymize, anonymization
kodex
A privacy and security engineering toolkit: Discover, understand, pseudonymize, anonymize, encrypt and securely share sensitive and personal data: Privacy and security as code.
Stars: ✭ 70 (-26.32%)
Mutual labels:  gdpr, anonymize, anonymization
pynonymizer
A universal tool for translating sensitive production database dumps into anonymized copies.
Stars: ✭ 58 (-38.95%)
Mutual labels:  gdpr, anonymization
database-anonymizer
CLI tool an PHP library to anonymize data in various databases
Stars: ✭ 23 (-75.79%)
Mutual labels:  gdpr, anonymization
pganonymize
A commandline tool for anonymizing PostgreSQL databases
Stars: ✭ 20 (-78.95%)
Mutual labels:  gdpr, anonymization
data-migrator
A declarative data-migration package
Stars: ✭ 15 (-84.21%)
Mutual labels:  gdpr, anonymization
havengrc
☁️Haven GRC - easier governance, risk, and compliance 👨‍⚕️👮‍♀️🦸‍♀️🕵️‍♀️👩‍🔬
Stars: ✭ 83 (-12.63%)
Mutual labels:  gdpr
DebugStatementsFixers
Fixers set for PHP-CS-Fixer. Removes debug statements, which shouldn't be in production ever.
Stars: ✭ 22 (-76.84%)
Mutual labels:  dump
silverstripe-cookie-consent
GDPR compliant cookie popup and consent checker
Stars: ✭ 16 (-83.16%)
Mutual labels:  gdpr
enhanced-privacy-m1
Magento 1 Enhanced Privacy extension for easier compliance with GDPR. Allows customers to delete, anonymize, or export their personal data.
Stars: ✭ 34 (-64.21%)
Mutual labels:  gdpr
DynamodbToCSV4j
Dump DynamoDB data into a CSV file using java
Stars: ✭ 18 (-81.05%)
Mutual labels:  dump
binance-pump-bot
Automation for Binance p&d(pump and dump) activity, ensures fastest purchase and provides auto selling functionality to lockdown profit during these events.
Stars: ✭ 112 (+17.89%)
Mutual labels:  dump
dd.js
Laravel dd() in JS
Stars: ✭ 51 (-46.32%)
Mutual labels:  dump
fluent-plugin-anonymizer
Fluentd filter output plugin to anonymize records with MD5/SHA1/SHA256/SHA384/SHA512 algorithms. This data masking plugin protects privacy data such as ID, email, phone number, IPv4/IPv6 address and so on.
Stars: ✭ 52 (-45.26%)
Mutual labels:  anonymize
Hemmelig.app
Keep your sensitive information out of chat logs, emails, and more with encrypted secrets.
Stars: ✭ 183 (+92.63%)
Mutual labels:  gdpr
risorse-gdpr
Raccolta di risorse sul GDPR
Stars: ✭ 20 (-78.95%)
Mutual labels:  gdpr
php-ip-anonymizer
IP address anonymizer library for PHP
Stars: ✭ 55 (-42.11%)
Mutual labels:  gdpr
proca
Widget to transform your website into a cutting-edge campaign in 10 min. multi-lingual, privacy first.
Stars: ✭ 29 (-69.47%)
Mutual labels:  gdpr
concrete
Concrete ecosystem is a set of crates that implements Zama's variant of TFHE. In a nutshell, fully homomorphic encryption (FHE), allows you to perform computations over encrypted data, allowing you to implement Zero Trust services.
Stars: ✭ 575 (+505.26%)
Mutual labels:  gdpr
laravel-lumen-mysql-encryption
Database fields encryption in laravel and lumen for mysql databases with native search.
Stars: ✭ 20 (-78.95%)
Mutual labels:  anonymize

pgantomizer

https://travis-ci.org/asgeirrr/pgantomizer.svg?branch=master https://coveralls.io/repos/github/asgeirrr/pgantomizer/badge.svg?branch=master

Anonymize data in your PostgreSQL dababase with ease. Anonymization is handy if you need to provide data to people that should not have access to the personal information of the users. Importing the data to third-party tools where you cannot guarantee what will happen to the data is also a common use case. This tool will come in handy when GDPR will take effect in EU-countries.

Anonymization Process

The rules for anonynimization are written in a single YAML file. Columns that should be left in the raw form without anonymization must be explicitly marked in the schema. This ensures that adding the new column in the DB without thinking about its sensitivity does not leak the data. The default name of the primary key is id but a custom one can be specified form the table in the schema. Primary key is NOT anonymized by default.

A sample YAML schema can be examined below.

customer:
    raw: [language, currency]
    pk: customer_id
customer_address:
    raw: [country, customer_id]
    custom_rules:
        address_line: aggregate_length

Sometimes it is needed to use a different anonymization function for a particular column. It can be specified in the custom_rules directive (see example above). There is a limited set of functions you can choose from. So far

  • aggregate_length - replaces content of the column with its length (can be used on any type that supports length function)

Calling pgantomizer from the Command Line

pgantomizer_dump is a helper script that dumps tables specified in the YAML schema file to a compressed file using pg_dump. Just pass the path to the schema and the DB connection details. Minimal working example taking advantage of default values of some of the required parameters:

pgantomizer_dump --schema my_schema.yaml --dbname original_postgres --user alaric

To see a list of all parameters, run:

pgantomizer_dump -h

The script is able to take the DB connection details from environmental variables following the conventions of running Django in Docker. The presumed variable names are: DB_DEFAULT_NAME, DB_DEFAULT_USER, DB_DEFAULT_PASS, DB_DEFAULT_SERVICE, DB_DEFAULT_PORT.

pgantomizer is the main script that loads the Postgre dump into a specified instance. Then all columns except primary keys and the ones specified in the schema as raw are anonymized according to their data type. Finally, the dump file is deleted by default to reduce risk of leakage of unanonymized data. The connection details of the Postgres instance where the anonymized data should be loaded can be passed as arguments

pgantomizer --schema my_schema.yaml --dump-file ./to_anonymize.sql --dbname anonymized_postgres --user alaric --password anonymized_pass --host localhost --port 5432

or through environmental variables with following names: ANONYMIZED_DB_NAME, ANONYMIZED_DB_USER, ANONYMIZED_DB_PASS, ANONYMIZED_DB_HOST, ANONYMIZED_DB_PORT.

Calling pgantomizer from Python

Use dump_db and load_anonymize_remove functions to dump anonymize the data from Python. In the following example, DB connections for the original and anonymized instance are specified via ENV variables described above.

from pgantomizer import dump_db, load_anonymize_remove

dump_db('to_anonymize.sql', 'anonymization_schema.yaml')
load_anonymize_remove('to_anonymize.sql', 'anonymization_schema.yaml')

Both functions have an optional db_args argument to pass the connection arguments explicitly in a dict. See the example below how the dict should look like.

If you are only after anonymizing an existing database, there is a function anonymize_db that will help you do that with a little extra work of parsing the YAML schema.

import yaml

from pgantomizer import anonymize_db

anonymize_db(yaml.load(open('anonymization_schema.yaml')), {
    'dbname': 'anonymized_postgres',
    'user': 'alaric',
    'password': 'anonymized_pass',
    'host': 'localhost',
    'port': '5432',
})

If you would like to use environmental variables instead, use function anonymize.get_db_args_from_env to construct the dict from ENV.

TODO

  • expand this README
  • submit package automatically to PyPI
  • add --dry-run argument that will check the schema and output the operations to be performed
  • remove password argument and use getpass instead for better security
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].