All Projects → Veridax → privapi

Veridax / privapi

Licence: Apache-2.0 license
Detect Sensitive REST API communication using Deep Neural Networks

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects
HCL
1544 projects

Projects that are alternatives of or similar to privapi

havengrc
☁️Haven GRC - easier governance, risk, and compliance 👨‍⚕️👮‍♀️🦸‍♀️🕵️‍♀️👩‍🔬
Stars: ✭ 83 (+97.62%)
Mutual labels:  gdpr, devsecops
Helib
HElib is an open-source software library that implements homomorphic encryption. It supports the BGV scheme with bootstrapping and the Approximate Number CKKS scheme. HElib also includes optimizations for efficient homomorphic evaluation, focusing on effective use of ciphertext packing techniques and on the Gentry-Halevi-Smart optimizations.
Stars: ✭ 2,749 (+6445.24%)
Mutual labels:  privacy-enhancing-technologies, privacy-by-design
HElib
HElib is an open-source software library that implements homomorphic encryption. It supports the BGV scheme with bootstrapping and the Approximate Number CKKS scheme. HElib also includes optimizations for efficient homomorphic evaluation, focusing on effective use of ciphertext packing techniques and on the Gentry-Halevi-Smart optimizations.
Stars: ✭ 2,913 (+6835.71%)
Mutual labels:  privacy-enhancing-technologies, privacy-by-design
Prowler
Prowler is a security tool to perform AWS security best practices assessments, audits, incident response, continuous monitoring, hardening and forensics readiness. It contains more than 200 controls covering CIS, ISO27001, GDPR, HIPAA, SOC2, ENS and other security frameworks.
Stars: ✭ 4,561 (+10759.52%)
Mutual labels:  gdpr, devsecops
lunasec
LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTrace GitHub App: https://github.com/marketplace/lunatrace-by-lunasec/
Stars: ✭ 1,261 (+2902.38%)
Mutual labels:  gdpr, devsecops
Hemmelig.app
Keep your sensitive information out of chat logs, emails, and more with encrypted secrets.
Stars: ✭ 183 (+335.71%)
Mutual labels:  gdpr, privacy-enhancing-technologies
swarm-learning
A simplified library for decentralized, privacy preserving machine learning
Stars: ✭ 142 (+238.1%)
Mutual labels:  privacy-enhancing-technologies, privacy-by-design
kodex
A privacy and security engineering toolkit: Discover, understand, pseudonymize, anonymize, encrypt and securely share sensitive and personal data: Privacy and security as code.
Stars: ✭ 70 (+66.67%)
Mutual labels:  gdpr, privacy-enhancing-technologies
GDPRDPIAT
A GDPR Data Protection Impact Assessment (DPIA) tool to assist organisations to evaluate data protection risks with respect to the EU's General Data Protection Regulation. 🇪🇺
Stars: ✭ 28 (-33.33%)
Mutual labels:  gdpr, devsecops
prowler
Prowler is an Open Source Security tool for AWS, Azure and GCP to perform Cloud Security best practices assessments, audits, incident response, compliance, continuous monitoring, hardening and forensics readiness. It contains hundreds of controls covering CIS, PCI-DSS, ISO27001, GDPR, HIPAA, FFIEC, SOC2, AWS FTR, ENS and custom security frameworks.
Stars: ✭ 8,046 (+19057.14%)
Mutual labels:  gdpr, devsecops
DevSecOps
Ultimate DevSecOps library
Stars: ✭ 4,450 (+10495.24%)
Mutual labels:  devsecops
ggshield-action
GitGuardian Shield GitHub Action - Find exposed credentials in your commits
Stars: ✭ 304 (+623.81%)
Mutual labels:  devsecops
parapred
Paratope Prediction using Deep Learning
Stars: ✭ 49 (+16.67%)
Mutual labels:  lstm-neural-networks
tag-manager
Website analytics, JavaScript error tracking + analytics, tag manager, data ingest endpoint creation (tracking pixels). GDPR + CCPA compliant.
Stars: ✭ 279 (+564.29%)
Mutual labels:  gdpr
voco
Privacy friendly voice control for the Candle Controller / WebThings Gateway
Stars: ✭ 18 (-57.14%)
Mutual labels:  privacy-enhancing-technologies
malware api class
Malware dataset for security researchers, data scientists. Public malware dataset generated by Cuckoo Sandbox based on Windows OS API calls analysis for cyber security researchers
Stars: ✭ 134 (+219.05%)
Mutual labels:  lstm-neural-networks
Sentiment-analysis-amazon-Products-Reviews
NLP with NLTK for Sentiment analysis amazon Products Reviews
Stars: ✭ 37 (-11.9%)
Mutual labels:  lstm-neural-networks
data-migrator
A declarative data-migration package
Stars: ✭ 15 (-64.29%)
Mutual labels:  gdpr
oc-gdpr-plugin
October CMS plugin to make websites GDPR and ePrivacy compliant
Stars: ✭ 32 (-23.81%)
Mutual labels:  gdpr
DeepLog
This is the realization of core DeepLog
Stars: ✭ 29 (-30.95%)
Mutual labels:  lstm-neural-networks

PrivAPI is a Python package that allows the classification of sensitive data flows within REST API communication using Deep Neural Networks (DNN). It relies on Google's Keras and TensorFlow.

As explained in PrivAPI - Detecting Personal Data within API Communication Using Deep Learning


 ____       _        _    ____ ___
|  _ \ _ __(_)_   __/ \  |  _ \_ _|
| |_) | '__| \ \ / / _ \ | |_) | |
|  __/| |  | |\ V / ___ \|  __/| |
|_|   |_|  |_| \_/_/   \_\_|  |___|

Quickstart

Requirements

Make sure you're running Python 3.5 or newer.

Setup environment

It's strongly recommended to build your own environment so that your local python packages don't get in the way. Create a Virtualenv environment within the venv project folder.

python3 -m venv venv

Then set the PYTHONPATH and activate the environment so the Python scripts can be found:

export PYTHONPATH=.
source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Create a Neural Network Model

In order to detect sensitive data flows the system has to learn from confidential and non-confidential REST API request payloads. The project ships with a pre-generated dataset that can be used to train an LSTM neural network. If you want to generate your own, please refer to the next section.

python privapi/train.py

This will generate both the model and token directionary files living within the out folder. It's strongly recommended to use a GPU box in order to speed up this process, as the model requires 100 epochs to converge.

Detect sensitive dataflows

Once a model has been trained, it's time to run predictions based on it. There are two examples: one that is sensitive (positive class) and one that is not (negative class). Both live within the predict folder.

python privapi/predict.py

This command will output predictions to the predictions.csv file within the project root.

  payload_file is_sensitive probability
0 magento-payload.json 1 0.9999695
1 slack-payload.json 0 0.40711942

As you can see, Magento's sensitive request payload has been classified as confidential with 99% confidence. The non-confidential Slack request payload was classified correctly, even if it contained a first and last name.

Drop the request payloads you wish to classify onto the predict folder and re-run the predict.py script. Any file having the .json extension will be picked up.

Enjoy!

Generate

For generating your own training dataset use the following command :

python privapi/generate.py

By default, the dataset will be saved as training.csv within the data folder.

In order to obtain relevant metrics of the generated dataset use :

python privapi/analyze.py

Configuration

What will heavily determine the accuracy of the predictions is the quality of the training dataset. In addition, to generate sound request payload examples we need to make sure that the associated label - whether sensitive or not - is correct.

In order to label an example, the generator will look in the config.py descriptor to determine whether there's a matching entry for a given OpenAPI operation parameter name that matches the name_type_to_gen dictionary. If there is, it will use the associated generator and label the example as positive (i.e. having PII).

Here's an example configuration file. Feel free to add your own custom entries in order to consider additional PII fields.

from privapi.fakers import (
    _full_name_, _date_, _id_, _key_, _company_business_id_, _company_, _bank_account_, _first_name_, _last_name_,
    _address_, _bban_, _city_, _country_, _country_code_, _ssn_, _email_, _phone_number_, _gender_,
    _building_number_, _iban_, _postal_code_, _state_, _street_, _province_, _amount_, _credit_score_,
    _credit_card_number_, _alphanumeric_, _location_, _latitude_, _longitude_, _timestamp_, _latitude_str_,
    _longitude_str_, _timestamp_str_, _amount_str_, _credit_score_str_)

name_type_to_gen = {'string':
                        {'[uU]ser': _full_name_,
                         '[fF]ullName': _full_name_,
                         'firstname': _first_name_,
                         'lastname': _last_name_,
                         '[aA]ddress': _address_,
                         '[nN]ationality': _country_,
                         '[dD]ate': _date_,
                         '[tT]axId': _company_business_id_,
                         '[sS]erial': _id_,
                         '[oO]rganization': _company_,
                         '[cC]ompany': _company_,
                         '[dD]ba': _company_,
                         '[dD]oingBusinessAs': _company_,
                         '[bB]usinessName': _company_,
                         '[aA]ccount': _bank_account_,
                         '[uU]UID': _id_,
                         '[sS]hareholder': _full_name_,
                         '[pP]ostalCode': _postal_code_,
                         '[zZ]ip': _postal_code_,
                         '[bB]ic': _bban_,
                         '[bB]ankCity': _city_,
                         '[bB]usinessContact': _full_name_,
                         '[cC]ity': _city_,
                         '[cC]ountryCode': _country_code_,
                         '[cC]country': _country_,
                         '[dD]ateOfBirth': _date_,
                         '[dD]ob': _date_,
                         '[dD]ocumentNumber': _ssn_,
                         '[pP]assport': _ssn_,
                         '[iI]dentityDocument': _ssn_,
                         '[iI]dNumber': _ssn_,
                         '[iI]dCard': _ssn_,
                         '[dD]rivingLicense': _ssn_,
                         '[cC]reditCard': _credit_card_number_,
                         '[eE]mail': _email_,
                         '[pP]hone': _phone_number_,
                         '[pP]honeCountryCode': _country_code_,
                         '[gG]ender': _gender_,
                         '[hH]ouse': _building_number_,
                         '[bB]uilding': _building_number_,
                         '[aA]partment': _building_number_,
                         '[aA]pt': _building_number_,
                         '[iI]ban': _iban_,
                         '[sS]tate': _state_,
                         '[pP]rovince': _province_,
                         '[sS]treet': _street_,
                         '[rR]ecordLocator': _alphanumeric_,
                         '[rR]eservationCode': _alphanumeric_,
                         '[lL]ocation': _location_,
                         '[lL]atitude': _latitude_str_,
                         '[lL]ongitude': _longitude_str_,
                         '[lL]at': _latitude_str_,
                         '[lL]on': _longitude_str_,
                         "[tT]imestamp": _timestamp_str_,
                         "[sS]ignature_sha1": _id_,

                         },
                    'number':
                        {
                            "[tT]imestamp": _timestamp_str_,
                            "[dD]ate": _timestamp_str_,
                            "[bB]alance": _amount_str_,
                            "[aA]mount": _amount_str_,
                            "[cC]redit": _amount_str_,
                            "[cC]reditScore": _credit_score_str_,
                            "[sS]core": _credit_score_str_,
                            "[lL]atitude": _latitude_str_,
                            "[lL]ongitude": _longitude_str_
                        }
                    }

exclusions = [".*amazonaws.com"]

Tests

Run tests:

python -m unittest

Contribute

Please see CONTRIBUTING.

License

PrivAPI is released under the Apache License. See the bundled LICENSE file for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].