All Projects → Butterfree → Similar Projects or Alternatives

1594 Open source projects that are alternatives of or similar to Butterfree

Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+402.38%)
Mutual labels:  data-science, etl, data-engineering, pyspark
Sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-37.3%)
Mutual labels:  data-science, etl, data-engineering
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1792.86%)
Mutual labels:  data-science, etl, data-engineering
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+3803.97%)
Mutual labels:  data-science, etl, data-engineering
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+385.71%)
Mutual labels:  etl, data-engineering, etl-framework
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-37.3%)
Mutual labels:  data-science, etl, data-engineering
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-69.05%)
Mutual labels:  etl, pyspark, etl-framework
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-80.95%)
Mutual labels:  etl, etl-framework
gallia-core
A schema-aware Scala library for data transformation
Stars: ✭ 44 (-65.08%)
Mutual labels:  etl, data-engineering
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (-83.33%)
Mutual labels:  etl, etl-framework
Spark Alchemy
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-3.17%)
Mutual labels:  data-science, data-engineering
OpenKettleWebUI
一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (+9.52%)
Mutual labels:  etl, etl-framework
etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (-69.84%)
Mutual labels:  etl, etl-framework
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+14.29%)
Mutual labels:  etl, data-engineering
cubetl
CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-83.33%)
Mutual labels:  etl, etl-framework
pangeo-forge-recipes
Python library for building Pangeo Forge recipes.
Stars: ✭ 64 (-49.21%)
Mutual labels:  etl, data-engineering
arthur-redshift-etl
ELT Code for your Data Warehouse
Stars: ✭ 22 (-82.54%)
Mutual labels:  etl, data-engineering
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-80.16%)
Mutual labels:  pyspark, data-engineering
etl manager
A python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-88.89%)
Mutual labels:  etl, data-engineering
AirflowDataPipeline
Example of an ETL Pipeline using Airflow
Stars: ✭ 24 (-80.95%)
Mutual labels:  etl, data-engineering
Benthos
Fancy stream processing made operationally mundane
Stars: ✭ 3,705 (+2840.48%)
Mutual labels:  etl, data-engineering
Dagster
An orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+3153.17%)
Mutual labels:  data-science, etl
Great expectations
Always know what to expect from your data.
Stars: ✭ 5,808 (+4509.52%)
Mutual labels:  data-science, data-engineering
Etlalchemy
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Stars: ✭ 460 (+265.08%)
Mutual labels:  etl, etl-framework
Prefect
The easiest way to automate your data
Stars: ✭ 7,956 (+6214.29%)
Mutual labels:  data-science, data-engineering
Pyetl
python ETL framework
Stars: ✭ 33 (-73.81%)
Mutual labels:  etl, etl-framework
csvplus
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (-46.83%)
Mutual labels:  etl, etl-framework
python mozetl
ETL jobs for Firefox Telemetry
Stars: ✭ 25 (-80.16%)
Mutual labels:  etl, pyspark
uptasticsearch
An Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (-62.7%)
Mutual labels:  etl, data-engineering
Getting Started
This repository is a getting started guide to Singer.
Stars: ✭ 734 (+482.54%)
Mutual labels:  etl, etl-framework
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+682.54%)
Mutual labels:  data-science, pyspark
Transformalize
Configurable Extract, Transform, and Load
Stars: ✭ 125 (-0.79%)
Mutual labels:  etl, etl-framework
D6t Python
Accelerate data science
Stars: ✭ 118 (-6.35%)
Mutual labels:  data-science, data-engineering
DataBridge.NET
Configurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-87.3%)
Mutual labels:  etl, etl-framework
blockchain-etl-streaming
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (-54.76%)
Mutual labels:  etl, data-engineering
lineage
Generate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-87.3%)
Mutual labels:  etl, pyspark
DataEngineering
This repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-62.7%)
Mutual labels:  pyspark, data-engineering
beneath
Beneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-48.41%)
Mutual labels:  etl, data-engineering
sparklanes
A lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-86.51%)
Mutual labels:  etl, pyspark
qwery
A SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-77.78%)
Mutual labels:  etl, etl-framework
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-80.16%)
Mutual labels:  etl, pyspark
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1099.21%)
Mutual labels:  data-science, data-engineering
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-57.94%)
Mutual labels:  etl, data-engineering
Datacleaner
The premier open source Data Quality solution
Stars: ✭ 391 (+210.32%)
Mutual labels:  data-science, etl
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+195.24%)
Mutual labels:  etl, etl-framework
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+186.51%)
Mutual labels:  etl, etl-framework
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+585.71%)
Mutual labels:  data-science, data-engineering
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+529.37%)
Mutual labels:  data-engineering, etl-framework
W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-49.21%)
Mutual labels:  data-science, pyspark
Learn Something Every Day
📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (+187.3%)
Mutual labels:  data-science, data-engineering
Awesome Business Intelligence
Actively curated list of awesome BI tools. PRs welcome!
Stars: ✭ 1,157 (+818.25%)
Mutual labels:  data-science, etl
Hale
(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)
Stars: ✭ 84 (-33.33%)
Mutual labels:  etl, etl-framework
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+961.9%)
Mutual labels:  data-science, pyspark
Applied Ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+14046.03%)
Mutual labels:  data-science, data-engineering
Etl with python
ETL with Python - Taught at DWH course 2017 (TAU)
Stars: ✭ 68 (-46.03%)
Mutual labels:  data-science, etl
morph-kgc
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (-38.89%)
Mutual labels:  etl, data-engineering
Pyspark Cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-14.29%)
Mutual labels:  data-science, pyspark
Dataform
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+171.43%)
Mutual labels:  etl, data-engineering
Stetl
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Stars: ✭ 64 (-49.21%)
Mutual labels:  etl, etl-framework
Superset
Apache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+33736.51%)
Mutual labels:  data-science, data-engineering
1-60 of 1594 similar projects