🕶️
anonympy Overview
A general Python library for data anonymization of tabular, text, image and sound data. See ArtLabs/projects for more or similar projects.
Main Features
Tabular
- Ease of use
- Efficient anonymization (based on pandas DataFrame)
- Numerous anonymization techniques
- Numeric
- Generalization - Binning
- Perturbation
- PCA Masking
- Generalization - Rounding
- Categorical
- Synthetic Data
- Resampling
- Tokenization
- Partial Email Masking
- DateTime
- Synthetic Date
- Perturbation
Images
- Anonymization Techniques
- Personal Images (faces)
- Blurring
- Pixaled Face Blurring
- Salt and Pepper Noise
- General Images
- Blurring
Text, Sound
- In Development
Installation
Dependencies
- Python (>= 3.7)
- cape-privacy
- faker
- pandas
- OpenCV
- . . .
Install with pip
Easiest way to install anonympy is using pip
pip install anonympy
Due to conflicting pandas/numpy versions with cape-privacy, it's recommend to install them seperately
pip install cape-privacy==0.3.0 --no-deps
Install from source
Installing the library from source code is also possible
git clone https://github.com/ArtLabss/open-data-anonimizer.git
cd open-data-anonimizer
pip install -r requirements.txt
make bootstrap
pip install cape-privacy==0.3.0 --no-deps
Downloading Repository
Or you could download this repository from pypi and run the following:
cd open-data-anonimizer
python setup.py install
Usage Example
You can find more examples here
Tabular
from anonympy.pandas import dfAnonymizer
from anonympy.pandas.utils import load_dataset
df = load_dataset()
print(df)
name | age | birthdate | salary | web | ssn | ||
---|---|---|---|---|---|---|---|
0 | Bruce | 33 | 1915-04-17 | 59234.32 | http://www.alandrosenburgcpapc.co.uk | [email protected] | 343554334 |
1 | Tony | 48 | 1970-05-29 | 49324.53 | http://www.capgeminiamerica.co.uk | [email protected] | 656564664 |
# Calling the generic Function
anonym = dfAnonymizer(df)
anonym.anonymize(inplace = False) # changes will be returned, not applied
name | age | birthdate | age | web | ssn | ||
---|---|---|---|---|---|---|---|
0 | Stephanie Patel | 30 | 1915-05-10 | 60000.0 | 5968b7880f | [email protected] | 391-77-9210 |
1 | Daniel Matthews | 50 | 1971-01-21 | 50000.0 | 2ae31d40d4 | [email protected] | 872-80-9114 |
# Or applying a specific anonymization technique to a column
from anonympy.pandas.utils import available_methods
anonym.categorical_columns
... ['name', 'web', 'email', 'ssn']
available_methods('categorical')
... categorical_fake categorical_fake_auto categorical_resampling categorical_tokenization categorical_email_masking
anonym.anonymize({'name': 'categorical_fake',
'age': 'numeric_noise',
'birthdate': 'datetime_noise',
'salary': 'numeric_rounding',
'web': 'categorical_tokenization',
'email':'categorical_email_masking',
'ssn': 'column_suppression'})
print(anonym.to_df())
name | age | birthdate | salary | web | ||
---|---|---|---|---|---|---|
0 | Paul Lang | 31 | 1915-04-17 | 60000.0 | 8ee92fb1bd | j*****[email protected] |
1 | Michael Gillespie | 42 | 1970-05-29 | 50000.0 | 51b615c92e | e*****[email protected] |
Images
# Passing an Image
import cv2
from anonympy.images import imAnonymizer
img = cv2.imread('sulking_boy.jpg')
anonym = imAnonymizer(img)
blurred = anonym.face_blur((31, 31), shape='r', box = 'r') # blurring shape and bounding box ('r' / 'c')
cv2.imshow('Blurred', blurred)
anonym.face_blur() |
anonym.face_pixel() |
anonym.face_SaP() |
---|---|---|
# Passing a Folder
path = 'C:/Users/shakhansho.sabzaliev/Downloads/Data' # images are inside `Data` folder
dst = 'D:/' # destination folder
anonym = imAnonymizer(path, dst)
anonym.blur(method = 'median', kernel = 11)
This will create a folder Output in dst
directory.
The Data folder had the following structure
| 1.jpg
| 2.jpg
| 3.jpeg
|
\---test
| 4.png
| 5.jpeg
|
\---test2
6.png
The Output folder will have the same structure and file names but blurred images.
Development
Contributions
The Contributing Guide has detailed information about contributing code and documentation.
Important Links
- Official source code repo: https://github.com/ArtLabss/open-data-anonimizer
- Download releases: https://pypi.org/project/anonympy/
- Issue tracker: https://github.com/ArtLabss/open-data-anonimizer/issues
License
Code of Conduct
Please see Code of Conduct. All community members are expected to follow it.