Top 205 etl open source projects

Kiba Plus
Kiba enhancement for Ruby ETL.
Bentools Etl
PHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.
Ether sql
A python library to push ethereum blockchain data into an sql database.
Configs
Public, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Ethereum Etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Yunmai Data Extract
Extract your data from the Yunmai weighing scales cloud API so you can use it elsewhere
Aws Auto Terminate Idle Emr
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Panther
Detect threats with log data and improve cloud security posture
Dswarm Backoffice Web
The backoffice web application of d:swarm (https://github.com/dswarm/dswarm-documentation/wiki)
Bandar Log
Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Getting Started
This repository is a getting started guide to Singer.
Monstache
a go daemon that syncs MongoDB to Elasticsearch in realtime
React Csv
React components to build CSV files on the fly basing on Array/literal object of data
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Baby Names Analysis
Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Ananas Desktop
A hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Koop
🔮 Transform, query, and download geospatial data on the web.
Bigslice
A serverless cluster computing system for the Go programming language
Smartcode
SmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!
Etlalchemy
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Pglogical
Logical Replication extension for PostgreSQL 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Datacleaner
The premier open source Data Quality solution
Abc
Power of appbase.io via CLI, with nifty imports from your favorite data sources
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Aistore
AIStore: scalable storage for AI applications
Wedatasphere
WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Dataform
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Webkettle
基于web版kettle开发的一套分布式综合调度,管理,ETL开发的用户专业版B/S架构工具
Smooks
An extensible Java framework for building XML and non-XML streaming applications
Datavec
ETL Library for Machine Learning - data pipelines, data munging and wrangling
etl manager
A python package to create a database on the platform using our moj data warehousing framework
bandar-log
Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
grate
A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
ETW2JSON
Tool and library to convert ETW logs to JSON files
mqtt-to-kafka-bridge
Move your messages from MQTT to Apache Kafka in real-time 🚀
sync-addons
Odoo Integration Addons
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
cpp-can-isotp
C++ implementation of CAN ISO 15765-2 also known as CAN ISO transport protocol. CPP CAN isotp.
pangeo-forge-recipes
Python library for building Pangeo Forge recipes.
Addax
Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
gamechanger-data
GAMECHANGER aspires to be the Department’s trusted solution for evidence-based, data-driven decision-making across the universe of DoD requirements
openrefine-docker
OpenRefine is a free, open source power tool for working with messy data and improving it. This repository contains Dockerbuild files for automated builds.
cardano-py
Python3 lib and cli for operating a Cardano Passive Node and using the API's. (PRE-ALPHA)
etl
M-Lab ingestion pipeline
mlbgameday
Multi-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.
spdr-etf-holdings
ETL for the SPDR ETF holdings XLS documents
openrefine-client
The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
61-120 of 205 etl projects