All Projects → Butterfree → Similar Projects or Alternatives

1594 Open source projects that are alternatives of or similar to Butterfree

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+402.38%)

Mutual labels: data-science, etl, data-engineering, pyspark

Sayn

Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

Stars: ✭ 79 (-37.3%)

Mutual labels: data-science, etl, data-engineering

Aws Data Wrangler

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+1792.86%)

Mutual labels: data-science, etl, data-engineering

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Stars: ✭ 4,919 (+3803.97%)

Mutual labels: data-science, etl, data-engineering

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+385.71%)

Mutual labels: etl, data-engineering, etl-framework

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-37.3%)

Mutual labels: data-science, etl, data-engineering

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (-69.05%)

Mutual labels: etl, pyspark, etl-framework

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-80.95%)

Mutual labels: etl, etl-framework

gallia-core

A schema-aware Scala library for data transformation

Stars: ✭ 44 (-65.08%)

Mutual labels: etl, data-engineering

redis-connect-dist

Real-Time Event Streaming & Change Data Capture

Stars: ✭ 21 (-83.33%)

Mutual labels: etl, etl-framework

Spark Alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

Stars: ✭ 122 (-3.17%)

Mutual labels: data-science, data-engineering

OpenKettleWebUI

一款基于kettle的数据处理web调度控制平台，支持文档资源库和数据库资源库，通过web平台控制kettle数据转换，可作为中间件集成到现有系统中

Stars: ✭ 138 (+9.52%)

Mutual labels: etl, etl-framework

etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Stars: ✭ 38 (-69.84%)

Mutual labels: etl, etl-framework

versatile-data-kit

Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.

Stars: ✭ 144 (+14.29%)

Mutual labels: etl, data-engineering

cubetl

CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)

Stars: ✭ 21 (-83.33%)

Mutual labels: etl, etl-framework

pangeo-forge-recipes

Python library for building Pangeo Forge recipes.

Stars: ✭ 64 (-49.21%)

Mutual labels: etl, data-engineering

arthur-redshift-etl

ELT Code for your Data Warehouse

Stars: ✭ 22 (-82.54%)

Mutual labels: etl, data-engineering

jobAnalytics and search

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Stars: ✭ 25 (-80.16%)

Mutual labels: pyspark, data-engineering

etl manager

A python package to create a database on the platform using our moj data warehousing framework

Stars: ✭ 14 (-88.89%)

Mutual labels: etl, data-engineering

AirflowDataPipeline

Example of an ETL Pipeline using Airflow

Stars: ✭ 24 (-80.95%)

Mutual labels: etl, data-engineering

Benthos

Fancy stream processing made operationally mundane

Stars: ✭ 3,705 (+2840.48%)

Mutual labels: etl, data-engineering

Dagster

An orchestration platform for the development, production, and observation of data assets.

Stars: ✭ 4,099 (+3153.17%)

Mutual labels: data-science, etl

Great expectations

Always know what to expect from your data.

Stars: ✭ 5,808 (+4509.52%)

Mutual labels: data-science, data-engineering

Etlalchemy

Extract, Transform, Load: Any SQL Database in 4 lines of Code.

Stars: ✭ 460 (+265.08%)

Mutual labels: etl, etl-framework

Prefect

The easiest way to automate your data

Stars: ✭ 7,956 (+6214.29%)

Mutual labels: data-science, data-engineering

Pyetl

python ETL framework

Stars: ✭ 33 (-73.81%)

Mutual labels: etl, etl-framework

csvplus

csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

Stars: ✭ 67 (-46.83%)

Mutual labels: etl, etl-framework

python mozetl

ETL jobs for Firefox Telemetry

Stars: ✭ 25 (-80.16%)

Mutual labels: etl, pyspark

uptasticsearch

An Elasticsearch client tailored to data science workflows.

Stars: ✭ 47 (-62.7%)

Mutual labels: etl, data-engineering

Getting Started

This repository is a getting started guide to Singer.

Stars: ✭ 734 (+482.54%)

Mutual labels: etl, etl-framework

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+682.54%)

Mutual labels: data-science, pyspark

Transformalize

Configurable Extract, Transform, and Load

Stars: ✭ 125 (-0.79%)

Mutual labels: etl, etl-framework

D6t Python

Accelerate data science

Stars: ✭ 118 (-6.35%)

Mutual labels: data-science, data-engineering

DataBridge.NET

Configurable data bridge for permanent ETL jobs

Stars: ✭ 16 (-87.3%)

Mutual labels: etl, etl-framework

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes

Stars: ✭ 57 (-54.76%)

Mutual labels: etl, data-engineering

lineage

Generate beautiful documentation for your data pipelines in markdown format

Stars: ✭ 16 (-87.3%)

Mutual labels: etl, pyspark

DataEngineering

This repo contains commands that data engineers use in day to day work.

Stars: ✭ 47 (-62.7%)

Mutual labels: pyspark, data-engineering

beneath

Beneath is a serverless real-time data platform ⚡️

Stars: ✭ 65 (-48.41%)

Mutual labels: etl, data-engineering

sparklanes

A lightweight data processing framework for Apache Spark

Stars: ✭ 17 (-86.51%)

Mutual labels: etl, pyspark

qwery

A SQL-like language for performing ETL transformations.

Stars: ✭ 28 (-77.78%)

Mutual labels: etl, etl-framework

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-80.16%)

Mutual labels: etl, pyspark

Just Dashboard

📊 📋 Dashboards using YAML or JSON files

Stars: ✭ 1,511 (+1099.21%)

Mutual labels: data-science, data-engineering

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (-57.94%)

Mutual labels: etl, data-engineering

Datacleaner

The premier open source Data Quality solution

Stars: ✭ 391 (+210.32%)

Mutual labels: data-science, etl

Choetl

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

Stars: ✭ 372 (+195.24%)

Mutual labels: etl, etl-framework

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (+186.51%)

Mutual labels: etl, etl-framework

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Stars: ✭ 864 (+585.71%)

Mutual labels: data-science, data-engineering

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+529.37%)

Mutual labels: data-engineering, etl-framework

W2v

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (-49.21%)

Mutual labels: data-science, pyspark

Learn Something Every Day

📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->

Stars: ✭ 362 (+187.3%)

Mutual labels: data-science, data-engineering

Awesome Business Intelligence

Actively curated list of awesome BI tools. PRs welcome!

Stars: ✭ 1,157 (+818.25%)

Mutual labels: data-science, etl

Hale

(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)

Stars: ✭ 84 (-33.33%)

Mutual labels: etl, etl-framework

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks