All Projects → ArtLabss → open-data-anonimizer

ArtLabss / open-data-anonimizer

Licence: BSD-3-Clause License
Python Data Anonymization & Masking Library For Data Science Tasks

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to open-data-anonimizer

spreadsheets-to-dataframes
Pycon 2021 Tutorial to help Spreadsheet (Excel) Users learn Python
Stars: ✭ 30 (-16.67%)
Mutual labels:  pandas
neworder
A dynamic microsimulation framework for python
Stars: ✭ 15 (-58.33%)
Mutual labels:  pandas
OFFLINE-ERP
A desktop application which helps students to choose Disciplinary and Open Electives wisely.
Stars: ✭ 16 (-55.56%)
Mutual labels:  pandas
Data-Science-Tutorials
Python Tutorials for Data Science
Stars: ✭ 104 (+188.89%)
Mutual labels:  pandas
monthly-returns-heatmap
Python Monthly Returns Heatmap (DEPRECATED! Use QuantStats instead)
Stars: ✭ 23 (-36.11%)
Mutual labels:  pandas
Python-camp
No description or website provided.
Stars: ✭ 34 (-5.56%)
Mutual labels:  pandas
Interactive-Data-Visualization-with-Python
Present your data as an effective and compelling story
Stars: ✭ 71 (+97.22%)
Mutual labels:  pandas
Arch-Data-Science
Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision
Stars: ✭ 92 (+155.56%)
Mutual labels:  pandas
GitHub-Stalker
track your GitHub statistics with Pandas
Stars: ✭ 31 (-13.89%)
Mutual labels:  pandas
skippa
SciKIt-learn Pipeline in PAndas
Stars: ✭ 33 (-8.33%)
Mutual labels:  pandas
raccoon
Python DataFrame with fast insert and appends
Stars: ✭ 64 (+77.78%)
Mutual labels:  pandas
pyfinmod
Financial modeling with Python and Pandas
Stars: ✭ 39 (+8.33%)
Mutual labels:  pandas
pandas-cheat-sheet-ja
pandas 公式チートシートの非公式翻訳版
Stars: ✭ 74 (+105.56%)
Mutual labels:  pandas
ESA
Easy SimAuto (ESA): An easy-to-use Power System Analysis Automation Environment atop PowerWorld Simulator Automation Server (SimAuto)
Stars: ✭ 26 (-27.78%)
Mutual labels:  pandas
DataSciPy
Data Science with Python
Stars: ✭ 15 (-58.33%)
Mutual labels:  pandas
veridical-flow
Making it easier to build stable, trustworthy data-science pipelines.
Stars: ✭ 28 (-22.22%)
Mutual labels:  pandas
stream2segment
A Python project to download, process and visualize medium-to-massive amount of seismic waveforms and metadata
Stars: ✭ 18 (-50%)
Mutual labels:  pandas
Dominando-Pandas
Este repositório está destinado ao processo de aprendizagem da biblioteca Pandas.
Stars: ✭ 22 (-38.89%)
Mutual labels:  pandas
Machine-Learning
This repository contains notebooks that will help you in understanding basic ML algorithms as well as basic numpy excercise. 💥 🌈 🌈
Stars: ✭ 15 (-58.33%)
Mutual labels:  pandas
grizzly
A Python-to-SQL transpiler as replacement for Python Pandas
Stars: ✭ 27 (-25%)
Mutual labels:  pandas

anonympy 🕶️



With ❤️ by ArtLabs

Overview

A general Python library for data anonymization of tabular, text, image and sound data. See ArtLabs/projects for more or similar projects.


Main Features

Tabular

  • Ease of use
  • Efficient anonymization (based on pandas DataFrame)
  • Numerous anonymization techniques
    • Numeric
      • Generalization - Binning
      • Perturbation
      • PCA Masking
      • Generalization - Rounding
    • Categorical
      • Synthetic Data
      • Resampling
      • Tokenization
      • Partial Email Masking
    • DateTime
      • Synthetic Date
      • Perturbation

Images

  • Anonymization Techniques
    • Personal Images (faces)
      • Blurring
      • Pixaled Face Blurring
      • Salt and Pepper Noise
    • General Images
      • Blurring

Text, Sound

  • In Development

Installation

Dependencies

  1. Python (>= 3.7)
  2. cape-privacy
  3. faker
  4. pandas
  5. OpenCV
  6. . . .

Install with pip

Easiest way to install anonympy is using pip

pip install anonympy

Due to conflicting pandas/numpy versions with cape-privacy, it's recommend to install them seperately

pip install cape-privacy==0.3.0 --no-deps 

Install from source

Installing the library from source code is also possible

git clone https://github.com/ArtLabss/open-data-anonimizer.git
cd open-data-anonimizer
pip install -r requirements.txt
make bootstrap
pip install cape-privacy==0.3.0 --no-deps 

Downloading Repository

Or you could download this repository from pypi and run the following:

cd open-data-anonimizer
python setup.py install

Usage Example

Google Colab

You can find more examples here

Tabular

from anonympy.pandas import dfAnonymizer
from anonympy.pandas.utils import load_dataset

df = load_dataset() 
print(df)
name age birthdate salary web email ssn
0 Bruce 33 1915-04-17 59234.32 http://www.alandrosenburgcpapc.co.uk [email protected] 343554334
1 Tony 48 1970-05-29 49324.53 http://www.capgeminiamerica.co.uk [email protected] 656564664
# Calling the generic Function
anonym = dfAnonymizer(df)
anonym.anonymize(inplace = False) # changes will be returned, not applied
name age birthdate age web email ssn
0 Stephanie Patel 30 1915-05-10 60000.0 5968b7880f [email protected] 391-77-9210
1 Daniel Matthews 50 1971-01-21 50000.0 2ae31d40d4 [email protected] 872-80-9114
# Or applying a specific anonymization technique to a column
from anonympy.pandas.utils import available_methods

anonym.categorical_columns
... ['name', 'web', 'email', 'ssn']
available_methods('categorical') 
... categorical_fake	categorical_fake_auto	categorical_resampling	categorical_tokenization	categorical_email_masking
  
anonym.anonymize({'name': 'categorical_fake', 
                  'age': 'numeric_noise',
                  'birthdate': 'datetime_noise',
                  'salary': 'numeric_rounding',
                  'web': 'categorical_tokenization', 
                  'email':'categorical_email_masking', 
                  'ssn': 'column_suppression'})
print(anonym.to_df())
name age birthdate salary web email
0 Paul Lang 31 1915-04-17 60000.0 8ee92fb1bd j*****[email protected]
1 Michael Gillespie 42 1970-05-29 50000.0 51b615c92e e*****[email protected]

Images

# Passing an Image
import cv2
from anonympy.images import imAnonymizer

img = cv2.imread('sulking_boy.jpg')
anonym = imAnonymizer(img)

blurred = anonym.face_blur((31, 31), shape='r', box = 'r')  # blurring shape and bounding box ('r' / 'c')
cv2.imshow('Blurred', blurred)
anonym.face_blur() anonym.face_pixel() anonym.face_SaP()
input_img1 output_img1 sap_image
# Passing a Folder 
path = 'C:/Users/shakhansho.sabzaliev/Downloads/Data' # images are inside `Data` folder
dst = 'D:/' # destination folder
anonym = imAnonymizer(path, dst)

anonym.blur(method = 'median', kernel = 11) 

This will create a folder Output in dst directory.

The Data folder had the following structure

|   1.jpg
|   2.jpg
|   3.jpeg
|   
\---test
    |   4.png
    |   5.jpeg
    |   
    \---test2
            6.png

The Output folder will have the same structure and file names but blurred images.


Development

Contributions

The Contributing Guide has detailed information about contributing code and documentation.

Important Links

License

BSD 3

Code of Conduct

Please see Code of Conduct. All community members are expected to follow it.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].