All Projects โ†’ aspuru-guzik-group โ†’ funsies

aspuru-guzik-group / funsies

Licence: MIT license
funsies is a lightweight workflow engine ๐Ÿ”ง

Programming Languages

python
139335 projects - #7 most used programming language
TeX
3793 projects

Projects that are alternatives of or similar to funsies

Prefect
The easiest way to automate your data
Stars: โœญ 7,956 (+21402.7%)
Mutual labels:  infrastructure, workflow-engine, data-engineering, data-ops
Udacity Data Engineering Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: โœญ 458 (+1137.84%)
Mutual labels:  infrastructure, data-engineering
Benthos
Fancy stream processing made operationally mundane
Stars: โœญ 3,705 (+9913.51%)
Mutual labels:  data-engineering, data-ops
Around Dataengineering
A Data Engineering & Machine Learning Knowledge Hub
Stars: โœญ 257 (+594.59%)
Mutual labels:  infrastructure, data-engineering
prefect-saturn
Python client for using Prefect Cloud with Saturn Cloud
Stars: โœญ 15 (-59.46%)
Mutual labels:  workflow-engine, data-engineering
practical-data-engineering
Real estate dagster pipeline
Stars: โœญ 110 (+197.3%)
Mutual labels:  data-engineering
preprocessy
Python package for Customizable Data Preprocessing Pipelines
Stars: โœญ 34 (-8.11%)
Mutual labels:  data-engineering
onix
A reactive configuration manager designed to support Infrastructure as a Code provisioning, and bi-directional configuration management providing a single source of truth across multi-cloud environments.
Stars: โœญ 89 (+140.54%)
Mutual labels:  infrastructure
stateless
Finite State Machine porting from Stateless C#
Stars: โœญ 25 (-32.43%)
Mutual labels:  workflow-engine
Spatio-Temporal-papers
This project is a collection of recent research in areas such as new infrastructure and urban computing, including white papers, academic papers, AI lab and dataset etc.
Stars: โœญ 180 (+386.49%)
Mutual labels:  infrastructure
kube-applier
kube-applier enables automated deployment and declarative configuration for your Kubernetes cluster.
Stars: โœญ 27 (-27.03%)
Mutual labels:  infrastructure
blockchain-etl-streaming
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: โœญ 57 (+54.05%)
Mutual labels:  data-engineering
infra
Infrastructure configuration for pyca projects (mostly dockerfiles)
Stars: โœญ 13 (-64.86%)
Mutual labels:  infrastructure
ansible-aws-infra-services
Manage your AWS infrastructure and ECS tasks with two separate ansible playbooks
Stars: โœญ 23 (-37.84%)
Mutual labels:  infrastructure
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: โœญ 53 (+43.24%)
Mutual labels:  data-engineering
awesome-cadence-temporal-workflow
A curated list of awesome things related to the Cadence and Temporal Workflow Engines
Stars: โœญ 63 (+70.27%)
Mutual labels:  workflow-engine
steep
โคด๏ธ Steep Workflow Management System โ€“ Run scientific workflows in the Cloud
Stars: โœญ 30 (-18.92%)
Mutual labels:  workflow-engine
rce
Distributed, workflow-driven integration environment
Stars: โœญ 42 (+13.51%)
Mutual labels:  workflow-engine
headless-wordpress
Headless Wordpress - AWS - Easy Setup
Stars: โœญ 42 (+13.51%)
Mutual labels:  infrastructure
terraform-aws-concourse
Terraform Module for a distributed concourse cluster on AWS
Stars: โœญ 12 (-67.57%)
Mutual labels:  infrastructure

funsies

is a python library and execution engine to build reproducible, fault-tolerant, distributed and composable computational workflows.

  • ๐Ÿ Workflows are specified in pure python.
  • ๐Ÿฆ Lightweight with few dependencies.
  • ๐Ÿš€ Easy to deploy to compute clusters and distributed systems.
  • ๐Ÿ”ง Can be embedded in your own apps.
  • ๐Ÿ“ First-class support for static analysis. Use mypy to check your workflows!

Workflows are encoded in a redis server and executed using the distributed job queue library RQ. A hash tree data structure enables automatic and transparent caching and incremental computing.

Source docs can be found here. Some example funsies scripts can be found in the recipes folder.

Installation

Using pip,

pip install funsies

This will enable the funsies CLI tool as well as the funsies python module. Python 3.7, 3.8 and 3.9 are supported. To run workflows, you'll need a Redis server, version 4.x or higher. On Linux Redis can be installed using conda,

conda install redis

pip,

pip install redis-server

or your system package manager. On Mac OSX, Redis can be downloaded using Homebrew,

brew install redis

(Windows is not supported by Redis, but a third-party package can be obtained from this repository. This has not been tested, however.)

Hello, funsies!

To run workflows, three components need to be connected:

  • ๐Ÿ“œ a python script describing the workflow
  • ๐Ÿ’ป a redis server that holds workflows and data
  • ๐Ÿ‘ท worker processes that execute the workflow

funsies is distributed: all three components can be on different computers or even be connected at different time. Redis is started using redis-server, workers are started using funsies worker and the workflow is run using python.

For running on a single machine, the start-funsies script takes care of starting the database and workers,

start-funsies \
    --no-pw \
    --workers 2

Here is an example workflow script,

from funsies import Fun, reduce, shell
with Fun():
    # you can run shell commands
    cmd = shell('sleep 2; echo ๐Ÿ‘‹ ๐Ÿช')
    # and python ones
    python = reduce(sum, [3, 2])
    # outputs are saved at hash addresses
    print(f"my outputs are saved to {cmd.stdout.hash[:5]} and {python.hash[:5]}")

The workflow is just python, and is run using the python interpreter,

$ python hello-world.py
my outputs are saved to 4138b and 80aa3

The Fun() context manager takes care of connecting to the database. The script should execute immediately; no work is done just yet because workflows are lazily executed.

To execute the workflow, we trigger using the hashes above using the CLI,

$ funsies execute 4138b 80aa3

Once the workers are finished, results can be printed directly to stdout using their hashes,

$ funsies cat 4138b
๐Ÿ‘‹ ๐Ÿช
$ funsies cat 80aa3
5

They can also be accessed from within python, from other steps in the workflows etc. Shutting down the database and workers can also be performed using the CLI,

$ funsies shutdown --all

How does it work?

The design of funsies is inspired by git and ccache. All files and variable values are abstracted into a provenance-tracking DAG structure. Basically, "files" are identified entirely based on what operations lead to their creation. This (somewhat opinionated) design produces interesting properties that are not common in workflow engines:

Incremental computation

funsies automatically and transparently saves all input and output "files". This produces automatic and transparent checkpointing and incremental computing. Re-running the same funsies script, even on a different machine, will not perform any computations (beyond database lookups). Modifying the script and re-running it will only recompute changed results.

In contrast with e.g. Make, this is not based on modification date but directly on the data history, which is more robust to changes in the workflow.

Decentralized workflows

Workflows and their elements are not identified based on any global indexing scheme. This makes it possible to generate workflows fully dynamically from any connected computer node, to merge or compose DAGs from different databases and to dynamically re-parametrize them, etc.

No local file operations

All "files" are encoded in a redis instance or to a data directory, with no local filesystem management required. funsies workers can even operate without any permanent data storage, as is often the case in file-driven workflows using only a container's tmpfs.

Recovering from failures

Raised exceptions in python codes, worker failures, missing output files and other error conditions are automatically caught by funsies workers, providing fault tolerance to workflows. Errors are logged on stderr with full traceback and can be recovered from the database.

Steps that depend on failed ones propagate those errors and their provenance. Errors can then be dealt with wherever it is most appropriate to do so using techniques from functional programming.

As an example, consider a workflow that first runs a CLI program simulate that ought to produce a results.csv file, which is subsequently analyzed using a python function analyze_data(),

import funsies as f

sim = f.shell("simulate data.inp", inp={"data.inp":"some input"}, out=["results.csv"])
final = f.reduce(analyze_data, sim.out["results.csv"])

In a normal python program, analyze_data() would need to guard against the possibility that results.csv is absent, or risk a fatal exception. In the above funsies script, if results.csv is not produced, then it is replaced by an instance of Error which tracks the failing step. The workflow engine automatically shortcircuit the execution of analyze_data and insteads forward the Error to final. In this way, the value of final provides direct error tracing to the failed step. Furthermore, it means that analyze_data does not need it's own error handling code if its output is optional or if the error is better dealt with in a later step.

This error-handling approach is heavily influenced by the Result<T,E> type from the Rust programming language.

Is it production-ready?

๐Ÿงช warning: funsies is research-grade code ! ๐Ÿงช

At this time, the funsies API is fairly stable. However, users should know that database dumps are not yet fully forward- or backward-compatible, and breaking changes are likely to be introduced on new releases.

Related projects

funsies is intended as a lightweight alternative to industrial workflow engines, such as Apache Airflow or Luigi. We rely heavily on awesome python libraries: RQ library, loguru, Click and chevron. We are inspired by git, ccache, snakemake targets, rain and others. A comprehensive list of other worfklow engine can be found here.

License

funsies is provided under the MIT license.

Contributing

All contributions are welcome! Consult the CONTRIBUTING file for help. Please file issues for any bugs and documentation problems.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].