All Projects → plaitpy → Plaitpy

plaitpy / Plaitpy

Licence: mit
plait.py - a fake data modeler

Programming Languages

python
139335 projects - #7 most used programming language
declarative
70 projects

Labels

Projects that are alternatives of or similar to Plaitpy

data-science-notes
Open-source project hosted at https://makeuseofdata.com to crowdsource a robust collection of notes related to data science (math, visualization, modeling, etc)
Stars: ✭ 52 (-87.5%)
Mutual labels:  modeling
pymadcad
Simple yet powerful CAD (Computer Aided Design) library, written with Python.
Stars: ✭ 63 (-84.86%)
Mutual labels:  modeling
Simpeg
Simulation and Parameter Estimation in Geophysics - A python package for simulation and gradient based parameter estimation in the context of geophysical applications.
Stars: ✭ 283 (-31.97%)
Mutual labels:  modeling
Covid-19-analysis
Analysis with Covid-19 data
Stars: ✭ 49 (-88.22%)
Mutual labels:  modeling
Vehicle-Dynamics-Lateral
OpenVD: Vehicle Dynamics - Lateral
Stars: ✭ 50 (-87.98%)
Mutual labels:  modeling
p5-HackaMol
Object-Oriented Perl 5, Moose Library for Molecular Hacking
Stars: ✭ 12 (-97.12%)
Mutual labels:  modeling
COMOKIT-Model
A GAMA (http://gama-platform.org) model on the assessment and comparisons of intervention policies against the CoVid19 pandemics
Stars: ✭ 23 (-94.47%)
Mutual labels:  modeling
Gaphor
Gaphor is the simple modeling tool
Stars: ✭ 386 (-7.21%)
Mutual labels:  modeling
Scorecard-Modeling
Use Machine learning to build scorecard model
Stars: ✭ 26 (-93.75%)
Mutual labels:  modeling
Sealion
The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.
Stars: ✭ 278 (-33.17%)
Mutual labels:  modeling
Mote3D toolbox
Toolbox for particulate microstructure modelling
Stars: ✭ 36 (-91.35%)
Mutual labels:  modeling
Fusion360WrapSketch
Wrap sketch curves around a cylinder
Stars: ✭ 33 (-92.07%)
Mutual labels:  modeling
Efdesigner
Entity Framework visual design surface and code-first code generation for EF6, Core and beyond
Stars: ✭ 256 (-38.46%)
Mutual labels:  modeling
PVSystems
A Modelica library for photovoltaic system and power converter design
Stars: ✭ 20 (-95.19%)
Mutual labels:  modeling
Cascadestudio
A Full Live-Scripted CAD Kernel in the Browser
Stars: ✭ 344 (-17.31%)
Mutual labels:  modeling
legend-sdlc
Legend SDLC module
Stars: ✭ 24 (-94.23%)
Mutual labels:  modeling
VIATRA-Generator
An efficient graph solver for generating well-formed models
Stars: ✭ 21 (-94.95%)
Mutual labels:  modeling
Gempy
GemPy is an open-source, Python-based 3-D structural geological modeling software, which allows the implicit (i.e. automatic) creation of complex geological models from interface and orientation data. It also offers support for stochastic modeling to adress parameter and model uncertainties.
Stars: ✭ 396 (-4.81%)
Mutual labels:  modeling
Jetuml
A desktop application for fast UML diagramming.
Stars: ✭ 346 (-16.83%)
Mutual labels:  modeling
Cobrapy
COBRApy is a package for constraint-based modeling of metabolic networks.
Stars: ✭ 267 (-35.82%)
Mutual labels:  modeling

plait.py

plait.py is a program for generating fake data from composable yaml templates.

The idea behind plait.py is that it should be easy to model fake data that has an interesting shape. Currently, many fake data generators model their data as a collection of IID variables; with plait.py we can stitch together those variables into a more coherent model.

some example uses for plait.py are:

  • generating mock application data in test environments
  • validating the usefulness of statistical techniques
  • creating synthetic datasets for performance tuning databases

features

  • declarative syntax
  • use basic faker.rb fields with #{} interpolators
  • sample and join data from CSV files
  • lambda expressions, switch and mixture fields
  • nested and composable templates
  • static variables and hidden fields

an example template

# a person generator
define:
  min_age: 10
  minor_age: 13
  working_age: 18

fields:
  age:
    random: gauss(25, 5)
    # minimum age is $min_age
    finalize: max($min_age, value)

  gender:
    mixture:
      - value: M
      - value: F

  name: "#{name.name}"
  job:
    value: "#{job.title}"
    onlyif: this.age > $working_age

  address:
    template: address/usa.yaml
  phone: # add a phone if the person is older than the minor age
    template: device/phone.yaml
    onlyif: this.age > ${minor_age}

  # we model our height as a gaussian that varies based on
  # age and gender
  height:
    lambda: this._base_height * this._age_factor
  _base_height:
    switch:
      - onlyif: this.gender == "F"
        random: gauss(60, 5)
      - onlyif: this.gender == "M"
        random: gauss(70, 5)

  _age_factor:
    switch:
      - onlyif: this.age < 15
        lambda: 1 - (20 - (this.age + 5)) / 20
      - default:
        value: 1

how its different

some specific examples of what plait.py can do:

  • generate proportional populations using census data and CSVs
  • create realistic zipcodes by state, city or region (also using CSVs)
  • create a taxi trip dataset with a cost model based on geodistance
  • add seasonal patterns (daily, weekly, etc) to data

usage

installation

# install with python
pip install plaitpy

# or with pypy
pypy-pip install plaitpy

cloning the repo for development

git clone https://github.com/plaitpy/plaitpy

# get the fakerb repo
git submodule init
git submodule update

generating records from command line

specify a template as a yaml file, then generate records from that yaml file.

# a simple example (if cloning plait.py repo)
python main.py templates/timestamp/uniform.yaml

# if plait.py is installed via pip
plait.py templates/timestamp/uniform.yaml

generating records from API

import plaitpy
t = plaitpy.Template("templates/timestamp/uniform.yaml")
print t.gen_record()
print t.gen_records(10)

looking up faker fields

plait.py also simplifies looking up faker fields:

# list faker namespaces
plait.py --list
# lookup faker namespaces
plait.py --lookup name

# lookup faker keys
# (-ll is short for --lookup)
plait.py --ll name.suffix

documentation

yaml file commands

  • see docs/FORMAT.md

datasets

  • see docs/EXAMPLES.md
  • also see templates/ dir

troubleshooting

  • see docs/TROUBLESHOOTING.md

Dependent Markov Processes

To simulate data that comes from many markov processes (a markov ecosystem), see the plaitpy-ipc repository.

future direction

If you have ideas on features to add, open an issue - Feedback is appreciated!

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].