sparkfish / augraphy

Licence: MIT license

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to augraphy

mtss-gan

MTSS-GAN: Multivariate Time Series Simulation with Generative Adversarial Networks (by @firmai)

Stars: ✭ 77 (+57.14%)

Mutual labels: synthetic-data, synthetic-dataset-generation

Snorkel

A system for quickly generating training data with weak supervision

Stars: ✭ 4,953 (+10008.16%)

Mutual labels: data-augmentation, training-data

recurrent-defocus-deblurring-synth-dual-pixel

Reference github repository for the paper "Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data". We propose a procedure to generate realistic DP data synthetically. Our synthesis approach mimics the optical image formation found on DP sensors and can be applied to virtual scenes rendered with standard computer software. Lev…

Stars: ✭ 30 (-38.78%)

Mutual labels: synthetic-data, synthetic-dataset-generation

multi-task-defocus-deblurring-dual-pixel-nimat

Reference github repository for the paper "Improving Single-Image Defocus Deblurring: How Dual-Pixel Images Help Through Multi-Task Learning". We propose a single-image deblurring network that incorporates the two sub-aperture views into a multitask framework. Specifically, we show that jointly learning to predict the two DP views from a single …

Stars: ✭ 29 (-40.82%)

Mutual labels: synthetic-data, synthetic-dataset-generation

game-feature-learning

Code for paper "Cross-Domain Self-supervised Multi-task Feature Learning using Synthetic Imagery", Ren et al., CVPR'18

Stars: ✭ 68 (+38.78%)

Mutual labels: synthetic-data

LegoBrickClassification

Repository to identify Lego bricks automatically only using images

Stars: ✭ 57 (+16.33%)

Mutual labels: synthetic-dataset-generation

table-evaluator

Evaluate real and synthetic datasets with each other

Stars: ✭ 44 (-10.2%)

Mutual labels: synthetic-data

IBMGenerator

IBM Synthetic Data Generator for Itemsets and Sequences

Stars: ✭ 20 (-59.18%)

Mutual labels: synthetic-dataset-generation

rivery cli

Rivery CLI

Stars: ✭ 16 (-67.35%)

Mutual labels: data-pipeline

bird species classification

Supervised Classification of bird species 🐦 in high resolution images, especially for, Himalayan birds, having diverse species with fairly low amount of labelled data

Stars: ✭ 59 (+20.41%)

Mutual labels: data-augmentation

Clustering-Datasets

This repository contains the collection of UCI (real-life) datasets and Synthetic (artificial) datasets (with cluster labels and MATLAB files) ready to use with clustering algorithms.

Stars: ✭ 189 (+285.71%)

Mutual labels: synthetic-data

fastai sparse

3D augmentation and transforms of 2D/3D sparse data, such as 3D triangle meshes or point clouds in Euclidean space. Extension of the Fast.ai library to train Sub-manifold Sparse Convolution Networks

Stars: ✭ 46 (-6.12%)

Mutual labels: data-augmentation

uoais

Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling", ICRA 2022

Stars: ✭ 77 (+57.14%)

Mutual labels: synthetic-data

VisDA2020

VisDA2020: 4th Visual Domain Adaptation Challenge in ECCV'20

Stars: ✭ 53 (+8.16%)

Mutual labels: synthetic-data

elasticdeform

Differentiable elastic deformations for N-dimensional images (Python, SciPy, NumPy, TensorFlow, PyTorch).

Stars: ✭ 134 (+173.47%)

Mutual labels: data-augmentation

Keras-MultiClass-Image-Classification

Multiclass image classification using Convolutional Neural Network

Stars: ✭ 48 (-2.04%)

Mutual labels: data-augmentation

semantic-parsing-dual

Source code and data for ACL 2019 Long Paper ``Semantic Parsing with Dual Learning".

Stars: ✭ 17 (-65.31%)

Mutual labels: data-augmentation

candock

A time series signal analysis and classification framework

Stars: ✭ 56 (+14.29%)

Mutual labels: data-augmentation

Awesome-Few-Shot-Image-Generation

A curated list of papers, code and resources pertaining to few-shot image generation.

Stars: ✭ 209 (+326.53%)

Mutual labels: data-augmentation

Robotics-Object-Pose-Estimation

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Stars: ✭ 153 (+212.24%)

Mutual labels: synthetic-data

View All Similar Projects ➔

Augraphy is a Python library that creates multiple copies of original documents though an augmentation pipeline that randomly distorts each copy -- degrading the clean version into dirty and realistic copies rendered through synthetic paper printing, faxing, scanning and copy machine processes.

Highly-configurable pipelines apply adjustments to the originals to create realistic old or noisy documents by acting as a factory, producing almost an infinite number of variations from their source. This simulation of realistic paper-oriented process distortions can create large amounts of training data for AI/ML processes to learn how to remove those distortions.

Treatments applied by Augraphy fabricate realistic documents that appear to have been printed on dirty laser or inkjet printers, scanned by dirty office scanners, faxed by low-resolution fax machines and otherwise mistreated by real-world paper handling office equipment.

What makes Augraphy Magical?

Virtually no readily available datasets exist with both a clean and noisy version of target documents. Augraphy addresses that problem by manufacturing large volumes of high-quality noisy documents to train alongside their clean source originals.

Training neural networks typically requires augmenting limited sources of data in a variety of ways so that networks can learn to generalize their solutions. Networks designed to work with scanned document images must be trained with images that have the type of distortions and noise typical of real-world scanned office documents.

However, if we only have real-world dirty documents, then we don’t have a good way to know for sure what the right answer is when training a neural network. By going in the reverse direction, starting with the clean document we hope a trained network will produce, we can simulate training data with dirty documents for which we already have a perfect original.

With flawless rendering of distorted "originals", we can train a model to undo all that distortion and restore the document to its original form. It’s pretty much magic!

How It Works

Augraphy's augmentation pipeline starts with an image of a clean document. The pipeline begins by extracting the text and graphics from the source into an "ink" layer. (Ink is synonymous with toner within Augraphy.) The augmentation pipeline then distorts and degrades the ink layer.

A paper factory provides either a white page or a randomly-selected paper texture base. Like the ink layer, the paper can also be processed through a pipeline to further provide random realistic paper textures.

After both the ink and paper phases are completed, processing continues by applying the ink, with its desired effects, to the paper. This merged document image is then augmented further with distortions such as adding folds or other physical deformations or distortions that rely on simultaneous interactions of paper and ink layers.

The end result is an image that mimics real documents.

Example Before / After Images

Example Usage

To use the default pipeline which contains all available augmentations and sensible defaults:

from augraphy import *

pipeline = default_augraphy_pipeline()

img = cv2.imread("image.png")

data = pipeline.augment(img)

augmented = data["output"]

Documentation

For full documentation, including installation and tutorials, check the doc directory.

Alternative Augmentation Libraries

There are plenty of choices when it comes to augmentation libraries. However, only Augraphy is designed to address everyday office automation needs associated with paper-oriented process distortions that come from printing, faxing, scanning and copy machines. Most other libraries focus on video and images pertinent to camera-oriented data sources and problem domains. Augraphy is focused on supporting problems related to automation of document images such as OCR, form recognition, form data extraction, document classification, barcode decoding, denoising, document restoration, identity document data extraction, document cropping, etc. Eventually, Augraphy will be able to support photo OCR problems with augmentations designed to emulate camera phone distortions.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Citations

If you used Augraphy in your research, please cite the project.

BibTeX:

@software{The_Augraphy_Project,
author = {{The Augraphy Project}},
title = {{Augraphy: an augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes}},
url = {https://github.com/sparkfish/augraphy},
version = {7.0.0}
}

APA:

The Augraphy Project. Augraphy: an augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes (Version 7.0.0) [Computer software]. https://github.com/sparkfish/augraphy

License

Augraphy is free and open-source software distributed under the terms of the MIT license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

sparkfish / augraphy

Programming Languages

Labels

Projects that are alternatives of or similar to augraphy

What makes Augraphy Magical?

How It Works

Example Before / After Images

Example Usage

Documentation

Alternative Augmentation Libraries

Contributing

Citations

License