All Projects → umich-dbgroup → foofah

umich-dbgroup / foofah

Licence: MIT license
Foofah: programming-by-example data transformation program synthesizer

Programming Languages

CSS
56736 projects
python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
javascript
184084 projects - #8 most used programming language
HTML
75241 projects
shell
77523 projects

Projects that are alternatives of or similar to foofah

optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+5529.17%)
Mutual labels:  data-transformation, data-wrangling, data-preparation, data-cleaning
Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Stars: ✭ 31 (+29.17%)
Mutual labels:  data-wrangling, data-cleaning
prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (+125%)
Mutual labels:  data-wrangling, data-preparation
bumblebee
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Stars: ✭ 120 (+400%)
Mutual labels:  data-preparation, data-cleaning
Data Forge Ts
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (+3929.17%)
Mutual labels:  data-wrangling, data-cleaning
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+4008.33%)
Mutual labels:  data-wrangling, data-cleaning
allie
🤖 A machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers).
Stars: ✭ 93 (+287.5%)
Mutual labels:  data-transformation, data-cleaning
fastverse
An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
Stars: ✭ 123 (+412.5%)
Mutual labels:  data-transformation
OpenRefine-ecology-lesson
Data Cleaning with OpenRefine for Ecologists
Stars: ✭ 20 (-16.67%)
Mutual labels:  data-cleaning
whyqd
data wrangling simplicity, complete audit transparency, and at speed
Stars: ✭ 16 (-33.33%)
Mutual labels:  data-wrangling
reskit
A library for creating and curating reproducible pipelines for scientific and industrial machine learning
Stars: ✭ 27 (+12.5%)
Mutual labels:  data-preparation
Chapter-2
Code examples for Chapter 2 of Data Wrangling with JavaScript
Stars: ✭ 16 (-33.33%)
Mutual labels:  data-wrangling
richflow
A Node.js and JavaScript synchronous data pipeline processing, data sharing and stream processing library. Actionable & Transformable Pipeline data processing.
Stars: ✭ 17 (-29.17%)
Mutual labels:  data-transformation
gallia-core
A schema-aware Scala library for data transformation
Stars: ✭ 44 (+83.33%)
Mutual labels:  data-transformation
dry-transformer
Data transformation toolkit
Stars: ✭ 59 (+145.83%)
Mutual labels:  data-transformation
Data-Wrangling-with-Python
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+275%)
Mutual labels:  data-wrangling
wrangler
Wrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (+162.5%)
Mutual labels:  data-transformation
xplore
A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.
Stars: ✭ 21 (-12.5%)
Mutual labels:  data-wrangling
advanced-data-wrangling-in-R-legacy
Advanced-data-wrangling-in-R, Workshop
Stars: ✭ 14 (-41.67%)
Mutual labels:  data-wrangling
pandas-workshop
An introductory workshop on pandas with notebooks and exercises for following along.
Stars: ✭ 161 (+570.83%)
Mutual labels:  data-wrangling

Foofah

Foofah [1][2] is a programming-by-example data transformation program synthesis system. It is able to generate a data transformation program defined in Professor Joe Hellerstein's Potter's Wheel paper [3] using an input-output example from the end user.

Requirements

In fact, other Python modules numpy, tabulate, cherrypy, editdistance, python-Levenshtein , matplotlib are also required. But they could be installed using setuptools in next section.

Foofah on Docker

Build Foofah container

$ docker build -t foofah .

Run Foofah contrainer

$ docker run -p 8080:8080 foofah

Foofah web service will be available at localhost:8080.

Installation

$ cd foofah
$ python setup.py install

User Guide

Foofah Console

To test Foofah against individual test case from the console:

$ cd foofah
$ python foofah.py --input <test_file>

Note that each test case must be a json file that contains one json object with two members, InputTable and OutputTable, both of which are 2d array of strings, representing the user-provided input-output example.

  • Link to an example test case.
  • Link to all benchmark test cases used in our full paper.

To learn other command-line argument options:

$ python foofah.py --help
Foofah Web Server

To interact with Foofah through a web interface (as shown in video):

$ python foofah_server.py

By default, the service will be available at localhost:8080.

Acknowledgements

Foofah is being developed in the University of Michigan. This work in part supported by National Science Foundation grants IIS-1250880, IIS-1054913, NSF IGERT grant 0903629, a Sloan Research Fellowship and a CSE Department Fellowship

References

[1] "Foofah: Transforming Data By Example", SIGMOD 17', Zhongjun Jin, Michael R. Anderson, Michael Cafarella, H. V. Jagadish

[2] "Foofah: A Programming-By-Example System for Synthesizing Data Transformation Programs", SIGMOD 17', Demo, Zhongjun Jin, Michael R. Anderson, Michael Cafarella, H. V. Jagadish

[3] "Potter's wheel: An interactive data cleaning system." VLDB. Vol. 1. 2001. Raman, Vijayshankar, and Joseph M. Hellerstein.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].