Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → gretelai → Gretel Synthetics

gretelai / Gretel Synthetics

Licence: apache-2.0

Differentially private learning to create fake, synthetic datasets with enhanced privacy guarantees

Programming Languages

python

139335 projects - #7 most used programming language

Labels

machine-learning tensorflow artificial-intelligence privacy generative-model

Projects that are alternatives of or similar to Gretel Synthetics

Adversarial Robustness Toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Stars: ✭ 2,638 (+1694.56%)

Mutual labels: artificial-intelligence, privacy

Stylegan2 Pytorch

Simplest working implementation of Stylegan2, state of the art generative adversarial network, in Pytorch. Enabling everyone to experience disentanglement

Stars: ✭ 2,656 (+1706.8%)

Mutual labels: artificial-intelligence, generative-model

Nogo

A cross-platform network-wide ad/site blocker with a simple web control panel.

Stars: ✭ 143 (-2.72%)

Mutual labels: privacy

Telenav.ai

Telenav.AI competition public repository

Stars: ✭ 146 (-0.68%)

Mutual labels: artificial-intelligence

Ai Job Info

互联网大厂面试经验

Stars: ✭ 145 (-1.36%)

Mutual labels: artificial-intelligence

Nd4j

Fast, Scientific and Numerical Computing for the JVM (NDArrays)

Stars: ✭ 1,742 (+1085.03%)

Mutual labels: artificial-intelligence

Docker Tor Hiddenservice Nginx

Easily setup a hidden service inside the Tor network

Stars: ✭ 145 (-1.36%)

Mutual labels: privacy

Mpyc

MPyC for Secure Multiparty Computation in Python

Stars: ✭ 142 (-3.4%)

Mutual labels: privacy

Self Driving Golf Cart

Be Driven 🚘

Stars: ✭ 147 (+0%)

Mutual labels: artificial-intelligence

Awesome Nlp Resources

This repository contains landmark research papers in Natural Language Processing that came out in this century.

Stars: ✭ 145 (-1.36%)

Mutual labels: artificial-intelligence

100daysofmlcode

My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.

Stars: ✭ 146 (-0.68%)

Mutual labels: artificial-intelligence

Mcts

Board game AI implementations using Monte Carlo Tree Search

Stars: ✭ 144 (-2.04%)

Mutual labels: artificial-intelligence

Tabnine Sublime

Tabnine Autocomplete AI: JavaScript, Python, TypeScript, PHP, C/C++, HTML/CSS, Go, Java, Ruby, C#, Rust, SQL, Bash, Kotlin, Julia, Lua, OCaml, Perl, Haskell, React

Stars: ✭ 144 (-2.04%)

Mutual labels: artificial-intelligence

Weekly.manong.io

码农周刊 - 史上最全的编程学习资料合集（持续更新）

Stars: ✭ 1,796 (+1121.77%)

Mutual labels: artificial-intelligence

App Privacy Policy Generator

A simple web app to generate a generic privacy policy for your Android/iOS apps

Stars: ✭ 2,278 (+1449.66%)

Mutual labels: privacy

Awesome Privacy

A curated list of tools and services that respect your privacy.

Stars: ✭ 147 (+0%)

Mutual labels: privacy

Keras age gender

Easy Real time gender age prediction from webcam video with Keras

Stars: ✭ 143 (-2.72%)

Mutual labels: artificial-intelligence

Mlkit

A simple machine learning framework written in Swift 🤖

Stars: ✭ 144 (-2.04%)

Mutual labels: artificial-intelligence

Hostsvn

Hosts block ads of Vietnamese

Stars: ✭ 145 (-1.36%)

Mutual labels: privacy

Floyd Cli

Command line tool for FloydHub - the fastest way to build, train, and deploy deep learning models

Stars: ✭ 147 (+0%)

Mutual labels: artificial-intelligence

View All Similar Projects ➔

Gretel Synthetics

An open source synthetic data library from Gretel.ai

Documentation

Try it out now!

If you want to quickly discover gretel-synthetics, simply click the button below and follow the tutorials!

Check out additional examples here.

Getting Started

By default, we do not install Tensorflow via pip as many developers and cloud services such as Google Colab are running customized versions for their hardware.

pip install -U .

pip install gretel-synthetics

then...

$ pip install jupyter
$ jupyter notebook

When the UI launches in your browser, navigate to examples/synthetic_records.ipynb and get generating!

If you want to install gretel-synthetics locally and use a GPU (recommended):

Create a virtual environment (e.g. using conda)

$ conda create --name tf --python=3.8

Activate the virtual environment

$ conda activate tf

Run the setup script ./setup-utils/setup-gretel-synthetics-tensorflow24-with-gpu.sh

The last step will install all the necessary software packages for GPU usage, tensorflow=2.4 and gretel-synthetics. Note that this script works only for Ubuntu 18.04. You might need to modify it for other OS versions.

Overview

This package allows developers to quickly get immersed with synthetic data generation through the use of neural networks. The more complex pieces of working with libraries like Tensorflow and differential privacy are bundled into friendly Python classes and functions. There are two high level modes that can be utilized.

Simple Mode

The simple mode will train line-per-line on an input file of text. When generating data, the generator will yield a custom object that can be used a variety of different ways based on your use case. This notebook demonstrates this mode.

DataFrame Mode

This library supports CSV / DataFrames natively using the DataFrame "batch" mode. This module provided a wrapper around our simple mode that is geared for working with tabular data. Additionally, it is capabable of handling a high number of columns by breaking the input DataFrame up into "batches" of columns and training a model on each batch. This notebook shows an overview of using this library with DataFrames natively.

Components

There are four primary components to be aware of when using this library.

Configurations. Configurations are classes that are specific to an underlying ML engine used to train and generate data. An example would be using TensorFlowConfig to create all the necessary paramters to train a model based on TF. LocalConfig is aliased to TensorFlowConfig for backwards compatability with older versions of the library. A model is saved to a designated directory, which can optionally be archived and utilized later.
Tokenizers. Tokenizers convert input text into integer based IDs that are used by the underlying ML engine. These tokenizers can be created and sent to the training input. This is optional, and if no specific tokenizer is specified then a default one will be used. You can find an example here that uses a simple char-by-char tokenizer to build a model from an input CSV. When training in a non-differentially private mode, we suggest using the default SentencePiece tokenizer, an unsupervised tokenizer that learns subword units (e.g., byte-pair-encoding (BPE) [Sennrich et al.]) and unigram language model [Kudo.]) for faster training and increased accuracy of the synthetic model.
Training. Training a model combines the configuration and tokenizer and builds a model, which is stored in the designated directory, that can be used to generate new records.
Generation. Once a model is trained, any number of new lines or records can be generated. Optionally, a record validator can be provided to ensure that the generated data meets any constraints that are necessary. See our notebooks for examples on validators.

Differential Privacy

Differential privacy support for our TensorFlow mode is built on the great work being done by the Google TF team and their TensorFlow Privacy library.

When utilizing DP, we currently recommend using the character tokenizer as it will only create a vocabulary of single tokens and removes the risk of sensitive data being memorized as actual tokens that can be replayed during generation.

There are also a few configuration options that are notable such as:

predict_batch_size should be set to 1
dp should be enabled
learning_rate, dp_noise_multiplier, dp_l2_norm_clip, and dp_microbatches can be adjusted to achieve various epsilon values.
reset_states should be disabled

Please see our example Notebook for training a DP model based on the Netflix Prize dataset.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 147

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗