All Projects → ElementAI → synbols

ElementAI / synbols

Licence: Apache-2.0 license
The Synbols dataset generator

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects
Makefile
30231 projects
shell
77523 projects

Projects that are alternatives of or similar to synbols

Building-Dataset-Generator
Procedural 3D data generation pipeline for architecture
Stars: ✭ 47 (+14.63%)
Mutual labels:  dataset-generation
Twords
Twitter Word Frequency Analysis
Stars: ✭ 17 (-58.54%)
Mutual labels:  dataset-generation
babi tools
Augmentation scripts for the bAbI Dialog Tasks dataset
Stars: ✭ 14 (-65.85%)
Mutual labels:  dataset-generation
smart categorizer
Trainable categorization tool
Stars: ✭ 64 (+56.1%)
Mutual labels:  dataset-generation
pyreports
pyreports is a python library that allows you to create complex report from various sources
Stars: ✭ 78 (+90.24%)
Mutual labels:  dataset-generation
Facebook-Profile-Pictures-Downloader
😆 Download public profile pictures from Facebook.
Stars: ✭ 23 (-43.9%)
Mutual labels:  dataset-generation
latent space adventures
Buckle up, adventure in the styleGAN2-ada-pytorch network latent space awaits
Stars: ✭ 59 (+43.9%)
Mutual labels:  dataset-generation
download audioset
📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).
Stars: ✭ 53 (+29.27%)
Mutual labels:  dataset-generation
STEP
Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits
Stars: ✭ 39 (-4.88%)
Mutual labels:  dataset-generation

ServiceNow completed its acquisition of Element AI on January 8, 2021. All references to Element AI in the materials that are part of this project should refer to ServiceNow.

#Synbols

Probing Learning Algorithms with Synthetic Datasets

License CircleCI Documentation Status

Synbols

Progress in the field of machine learning has been fueled by the introduction of benchmark datasets pushing the limits of existing algorithms. Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field. In this sense, we introduce Synbols — Synthetic Symbols — a tool for rapidly generating new datasets with a rich composition of latent features rendered in low resolution images. Synbols leverages the large amount of symbols available in the Unicode standard and the wide range of artistic font provided by the open font community. Our tool's high-level interface provides a language for rapidly generating new distributions on the latent features, including various types of textures and occlusions. To showcase the versatility of Synbols, we use it to dissect the limitations and flaws in standard learning algorithms in various learning setups including supervised learning, active learning, out of distribution generalization, unsupervised representation learning, and object counting.

[paper]

Description

This is the code repository for the Synbols dataset generator. Dataloaders and examples such as image classification can be found in https://github.com/ElementAI/synbols-benchmarks.

Installation

The easiest way to install Synbols is via PyPI. Simply run the following command:

pip install synbols

Software dependencies

Synbols relies on fonts and system packages. To ensure reproducibility, we provide a Docker image with everything preinstalled. Thus, the only dependency is Docker (see here to install).

Usage

Using predefined generators

$ synbols-datasets --help
$ synbols-datasets --dataset=some-large-occlusion --n_samples=1000 --seed=42

Generating some-large-occlusion dataset. Info: With probability 20%, add a large occlusion over the existing symbol.
Preview generated.
 35%|############################2                                                   | 353/1000 [00:05<00:10, 63.38it/s]

Defining your own generator

Examples of how to create new datasets can be found in the examples directory.

def translation(rng):
    """Generates translations uniformly from (-2, 2), going outside of the box."""
    return tuple(rng.uniform(low=-2, high=2, size=2))


# Modifies the default attribute sampler to fix the scale to a constant and the (x,y) translation to a new distribution
attr_sampler = basic_attribute_sampler(scale=0.5, translation=translation)

generate_and_write_dataset(dataset_path, attr_sampler, n_samples)

To generate your dataset, you need to run your code in the Synbols runtime environment. This is done using the synbols command as follows:

synbols mydataset.py --foo bar

Launch the example notebook

We provide an example Jupyter notebook in the examples directory. To run this notebook, first download it locally and run the following command at the notebook's location:

synbols-jupyter

This will launch jupyter notebook in the Synbols runtime environment and allow you to access it via your browser.

Contact

For any bug or feature requests, please create an issue.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].