anonympy 🕶️

With ❤️ by ArtLabs

Overview

A general Python library for data anonymization of tabular, text, image and sound data. See ArtLabs/projects for more or similar projects.

Main Features

Tabular

Ease of use
Efficient anonymization (based on pandas DataFrame)
Numerous anonymization techniques

Numeric

Generalization - Binning
Perturbation
PCA Masking
Generalization - Rounding

Categorical

Synthetic Data
Resampling
Tokenization
Partial Email Masking

DateTime

Synthetic Date
Perturbation

Images

Anonymization Techniques

Personal Images (faces)

Blurring
Pixaled Face Blurring
Salt and Pepper Noise

General Images

Blurring

Text, Sound

In Development

Installation

Dependencies

Python (>= 3.7)
cape-privacy
faker
pandas
OpenCV
. . .

Install with pip

Easiest way to install anonympy is using pip

pip install anonympy

Due to conflicting pandas/numpy versions with cape-privacy, it's recommend to install them seperately

pip install cape-privacy==0.3.0 --no-deps

Install from source

Installing the library from source code is also possible

git clone https://github.com/ArtLabss/open-data-anonimizer.git
cd open-data-anonimizer
pip install -r requirements.txt
make bootstrap
pip install cape-privacy==0.3.0 --no-deps

Downloading Repository

Or you could download this repository from pypi and run the following:

cd open-data-anonimizer
python setup.py install

Usage Example

You can find more examples here

Tabular

from anonympy.pandas import dfAnonymizer
from anonympy.pandas.utils import load_dataset

df = load_dataset() 
print(df)

	name	age	birthdate	salary	web	email	ssn
0	Bruce	33	1915-04-17	59234.32	http://www.alandrosenburgcpapc.co.uk	[email protected]	343554334
1	Tony	48	1970-05-29	49324.53	http://www.capgeminiamerica.co.uk	[email protected]	656564664

# Calling the generic Function
anonym = dfAnonymizer(df)
anonym.anonymize(inplace = False) # changes will be returned, not applied

	name	age	birthdate	age	web	email	ssn
0	Stephanie Patel	30	1915-05-10	60000.0	5968b7880f	[email protected]	391-77-9210
1	Daniel Matthews	50	1971-01-21	50000.0	2ae31d40d4	[email protected]	872-80-9114

# Or applying a specific anonymization technique to a column
from anonympy.pandas.utils import available_methods

anonym.categorical_columns
... ['name', 'web', 'email', 'ssn']
available_methods('categorical') 
... categorical_fake	categorical_fake_auto	categorical_resampling	categorical_tokenization	categorical_email_masking
  
anonym.anonymize({'name': 'categorical_fake', 
                  'age': 'numeric_noise',
                  'birthdate': 'datetime_noise',
                  'salary': 'numeric_rounding',
                  'web': 'categorical_tokenization', 
                  'email':'categorical_email_masking', 
                  'ssn': 'column_suppression'})
print(anonym.to_df())

	name	age	birthdate	salary	web	email
0	Paul Lang	31	1915-04-17	60000.0	8ee92fb1bd	j*****[email protected]
1	Michael Gillespie	42	1970-05-29	50000.0	51b615c92e	e*****[email protected]

Images

# Passing an Image
import cv2
from anonympy.images import imAnonymizer

img = cv2.imread('sulking_boy.jpg')
anonym = imAnonymizer(img)

blurred = anonym.face_blur((31, 31), shape='r', box = 'r')  # blurring shape and bounding box ('r' / 'c')
cv2.imshow('Blurred', blurred)

`anonym.face_blur()`	`anonym.face_pixel()`	`anonym.face_SaP()`

# Passing a Folder 
path = 'C:/Users/shakhansho.sabzaliev/Downloads/Data' # images are inside `Data` folder
dst = 'D:/' # destination folder
anonym = imAnonymizer(path, dst)

anonym.blur(method = 'median', kernel = 11)

This will create a folder Output in dst directory.

The Data folder had the following structure

|   1.jpg
|   2.jpg
|   3.jpeg
|   
\---test
    |   4.png
    |   5.jpeg
    |   
    \---test2
            6.png

The Output folder will have the same structure and file names but blurred images.

Development

Contributions

The Contributing Guide has detailed information about contributing code and documentation.

Important Links

Official source code repo: https://github.com/ArtLabss/open-data-anonimizer
Download releases: https://pypi.org/project/anonympy/
Issue tracker: https://github.com/ArtLabss/open-data-anonimizer/issues

License

BSD 3

Code of Conduct

Please see Code of Conduct. All community members are expected to follow it.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ArtLabss / open-data-anonimizer

Programming Languages

Labels

Projects that are alternatives of or similar to open-data-anonimizer