All Projects → getsynth → synth

getsynth / synth

Licence: Apache-2.0 license
The Declarative Data Generator

Programming Languages

rust
11053 projects
typescript
32286 projects
shell
77523 projects
Nix
1067 projects

Projects that are alternatives of or similar to synth

genalog
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
Stars: ✭ 234 (-75.57%)
Mutual labels:  data-generation, synthetic-data
DeepEcho
Synthetic Data Generation for mixed-type, multivariate time series.
Stars: ✭ 44 (-95.41%)
Mutual labels:  data-generation, synthetic-data
Generatedata
A powerful, feature-rich, random test data generator.
Stars: ✭ 1,883 (+96.56%)
Mutual labels:  data-generation, test-data-generator
Stream data
Data generation and property-based testing for Elixir. 🔮
Stars: ✭ 597 (-37.68%)
Mutual labels:  data-generation
Awesome Ai Ml Dl
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Stars: ✭ 831 (-13.26%)
Mutual labels:  data-generation
ranger
Ranger is contextual data generator used to make sensible data for integration tests or to play with it in the database
Stars: ✭ 59 (-93.84%)
Mutual labels:  data-generation
Deepconvsep
Deep Convolutional Neural Networks for Musical Source Separation
Stars: ✭ 424 (-55.74%)
Mutual labels:  data-generation
mtss-gan
MTSS-GAN: Multivariate Time Series Simulation with Generative Adversarial Networks (by @firmai)
Stars: ✭ 77 (-91.96%)
Mutual labels:  synthetic-data
Pydbgen
Random dataframe and database table generator
Stars: ✭ 191 (-80.06%)
Mutual labels:  data-generation
Synth
The Declarative Data Generator
Stars: ✭ 161 (-83.19%)
Mutual labels:  data-generation
random-jpa
Create random test data for JPA/Hibernate entities.
Stars: ✭ 23 (-97.6%)
Mutual labels:  test-data-generator
Copulas
A library to model multivariate data using copulas.
Stars: ✭ 149 (-84.45%)
Mutual labels:  data-generation
Neuralyzer
Neuralyzer is a library and a command line tool to anonymize databases (by updating existing data or populating a table with fake data)
Stars: ✭ 45 (-95.3%)
Mutual labels:  data-generation
SegSwap
(CVPRW 2022) Learning Co-segmentation by Segment Swapping for Retrieval and Discovery
Stars: ✭ 46 (-95.2%)
Mutual labels:  synthetic-data
Data Augmentation Review
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to github repos, papers and others.
Stars: ✭ 785 (-18.06%)
Mutual labels:  data-generation
k6-example-data-generation
Example repository showing how to utilise k6 and faker to load test using generated data
Stars: ✭ 32 (-96.66%)
Mutual labels:  data-generation
Regexp Examples
Generate strings that match a given regular expression
Stars: ✭ 483 (-49.58%)
Mutual labels:  data-generation
Wakefield
Generate random data sets
Stars: ✭ 208 (-78.29%)
Mutual labels:  data-generation
BadMedicine
Library and CLI for randomly generating medical data like you might get out of an Electronic Health Records (EHR) system
Stars: ✭ 18 (-98.12%)
Mutual labels:  synthetic-data
FAST-RIR
This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Stars: ✭ 90 (-90.61%)
Mutual labels:  synthetic-data

The Declarative Data Generator


docs license language build status discord Synth open source contributors


NOTE: The Synth project is no longer being actively maintained. New issues and pull requests will likely not be addressed. If you're interested in taking over as a maintainer of the project reach out to [email protected].

The project now has new maintainers and a fresh start. Plenty of work to stabilize master, feel free to chip in.


Synth is a tool for generating realistic data using a declarative data model. Synth is database agnostic and can scale to millions of rows of data.

Why Synth

Synth answers a simple question. There are so many ways to consume data, why are there no frameworks for generating data?

Synth provides a robust, declarative framework for specifying constraint based data generation, solving the following problems developers face on the regular:

  1. You're creating an App from scratch and have no way to populate your fresh schema with correct, realistic data.
  2. You're doing integration testing / QA on production data, but you know it is bad practice, and you really should not be doing that.
  3. You want to see how your system will scale if your database suddenly has 10x the amount of data.

Synth solves exactly these problems with a flexible declarative data model which you can version control in git, peer review, and automate.

Key Features

The key features of Synth are:

  • Data as Code: Data generation is described using a declarative configuration language allowing you to specify your entire data model as code.

  • Import from Existing Sources: Synth can import data from existing sources and automatically create data models. Synth currently has Alpha support for Postgres, MySQL and mongoDB!

  • Data Inference: While ingesting data, Synth automatically works out the relations, distributions and types of the dataset.

  • Database Agnostic: Synth supports semi-structured data and is database agnostic - playing nicely with SQL and NoSQL databases.

  • Semantic Data Types: Synth uses the fake-rs crate to enable the generation of semantically rich data with support for types like names, addresses, credit card numbers etc.

Status

  • Alpha: We are testing synth with a closed set of users
  • Public Alpha: Anyone can install synth. But go easy on us, there are a few kinks
  • Public Beta: Stable enough for most non-enterprise use-cases
  • Public: Production-ready

We are currently in Public Alpha. Watch "releases" of this repo to get notified of major updates.

Installation & Getting Started

On Linux and MacOS you can get started with the one-liner:

# Optional, set install path
$ export SYNTH_INSTALL_PATH=~/bin
$ curl -sSL https://getsynth.com/install | sh

For more installation options, check out the docs.

Examples

Building a data model from scratch

To start generating data without having a source to import from, you need to add Synth schema files to a namespace directory:

To get started we'll create a namespace directory for our data model and call it my_app:

$ mkdir my_app

Next let's create a users collection using Synth's configuration language, and put it into my_app/users.json:

{
    "type": "array",
    "length": {
        "type": "number",
        "constant": 1
    },
    "content": {
        "type": "object",
        "id": {
            "type": "number",
            "id": {}
        },
        "email": {
            "type": "string",
            "faker": {
                "generator": "safe_email"
            }
        },
        "joined_on": {
            "type": "date_time",
            "format": "%Y-%m-%d",
            "subtype": "naive_date",
            "begin": "2010-01-01",
            "end": "2020-01-01"
        }
    }
}

Finally, generate data using the synth generate command:

$ synth generate my_app/ --size 2 | jq
{
  "users": [
    {
      "email": "[email protected]",
      "id": 1,
      "joined_on": "2014-12-14"
    },
    {
      "email": "[email protected]",
      "id": 2,
      "joined_on": "2013-04-06"
    }
  ]
}

Building a data model from an external database

If you have an existing database, Synth can automatically generate a data model by inspecting the database.

You can use the synth import command to automatically generate Synth schema files from your Postgres, MySQL or MongoDB database:

$ synth import tpch --from postgres://user:pass@localhost:5432/tpch
Building customer collection...
Building primary keys...
Building foreign keys...
Ingesting data for table customer...  10 rows done.

Finally, generate data into another instance of Postgres:

$ synth generate tpch --to postgres://user:pass@localhost:5433/tpch

Why Rust

We decided to build Synth from the ground up in Rust. We love Rust, and given the scale of data we wanted synth to generate, it made sense as a first choice. The combination of memory safety, performance, expressiveness and a great community made it a no-brainer and we've never looked back!

Get in touch

If you would like to learn more, or you would like support for your use-case, feel free to open an issue on GitHub.

If your query is more sensitive, you can email [email protected] and we'll happily chat about your usecase.

About Us

The Synth project is backed by OpenQuery. We are a YCombinator backed startup based in London, England. We are passionate about data privacy, developer productivity, and building great tools for software engineers.

Contributing

First of all, we sincerely appreciate all contributions to Synth, large or small so thank you.

See the contributing section for details.

License

Synth is source-available and licensed under the Apache 2.0 License.

Contributors

Thanks goes to these wonderful people (emoji key):


Christos Hadjiaslanis

📝 💼 💻 🖋 🎨 📖 🔍 🤔 🚇 🚧 📦 👀 🛡️ ⚠️ 📢

Nodar Daneliya

📝 💼 🖋 🎨 📖 🔍 🤔

llogiq

💼 💻 🖋 🤔 🚇 🚧 🧑‍🏫 👀 🛡️ ⚠️

Dmitri Shkurski

💻

Damien Broka

📝 💼 💻 🖋 🎨 📖 🔍 🤔 🚇 🚧 👀 ⚠️

fretz12

🤔 💻 📖 ⚠️

Tyler Bailey

💻 📖

Júnior Bassani

🐛 💻

Daniel Hofstetter

🐛 💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].