All Projects → quintoandar → Butterfree

quintoandar / Butterfree

Licence: apache-2.0
A tool for building feature stores.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Butterfree

Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+402.38%)
Mutual labels:  data-science, etl, data-engineering, pyspark
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1792.86%)
Mutual labels:  data-science, etl, data-engineering
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-37.3%)
Mutual labels:  data-science, etl, data-engineering
Sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-37.3%)
Mutual labels:  data-science, etl, data-engineering
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+3803.97%)
Mutual labels:  data-science, etl, data-engineering
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-69.05%)
Mutual labels:  etl, pyspark, etl-framework
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+385.71%)
Mutual labels:  etl, data-engineering, etl-framework
Etl with python
ETL with Python - Taught at DWH course 2017 (TAU)
Stars: ✭ 68 (-46.03%)
Mutual labels:  data-science, etl
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+682.54%)
Mutual labels:  data-science, pyspark
Awesome Business Intelligence
Actively curated list of awesome BI tools. PRs welcome!
Stars: ✭ 1,157 (+818.25%)
Mutual labels:  data-science, etl
Hale
(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)
Stars: ✭ 84 (-33.33%)
Mutual labels:  etl, etl-framework
Applied Ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+14046.03%)
Mutual labels:  data-science, data-engineering
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+961.9%)
Mutual labels:  data-science, pyspark
Stetl
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Stars: ✭ 64 (-49.21%)
Mutual labels:  etl, etl-framework
W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-49.21%)
Mutual labels:  data-science, pyspark
Pyetl
python ETL framework
Stars: ✭ 33 (-73.81%)
Mutual labels:  etl, etl-framework
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1099.21%)
Mutual labels:  data-science, data-engineering
Pyspark Cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-14.29%)
Mutual labels:  data-science, pyspark
Transformalize
Configurable Extract, Transform, and Load
Stars: ✭ 125 (-0.79%)
Mutual labels:  etl, etl-framework
Spark Alchemy
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-3.17%)
Mutual labels:  data-science, data-engineering

Butterfree

A tool for building feature stores. Transform your raw data into beautiful features.

Release Python Version License Code style: black

Source Downloads Page Installation Command
PyPi PyPi Downloads Link pip install butterfree

Build status

Develop Stable Documentation Sonar
Test Publish Documentation Status Quality Gate Status

Made with ❤️ by the MLOps team from QuintoAndar

This library supports Python version 3.7+ and meant to provide tools for building ETL pipelines for Feature Stores using Apache Spark.

The library is centered on the following concetps:

  • ETL: central framework to create data pipelines. Spark-based Extract, Transform and Load modules ready to use.
  • Declarative Feature Engineering: care about what you want to compute and not how to code it.
  • Feature Store Modeling: the library easily provides everything you need to process and load data to your Feature Store.

To understand the main concepts of Feature Store modeling and library main features you can check Butterfree's Documentation, which is hosted by Read the Docs.

To learn how to use Butterfree in practice, see Butterfree's notebook examples

Requirements and Installation

Butterfree depends on Python 3.7+ and it is Spark 3.0 ready ✔️

Python Package Index hosts reference to a pip-installable module of this library, using it is as straightforward as including it on your project's requirements.

pip install butterfree

Or after listing butterfree in your requirements.txt file:

pip install -r requirements.txt

Dev Package are available for testing using the .devN versions of the Butterfree on PyPi.

License

Apache License 2.0

Contributing

All contributions are welcome! Feel free to open Pull Requests. Check the development and contributing guidelines described here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].