Bentools EtlPHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.
Ether sqlA python library to push ethereum blockchain data into an sql database.
ConfigsPublic, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Pyetlpython ETL framework
Ethereum EtlPython scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Yunmai Data ExtractExtract your data from the Yunmai weighing scales cloud API so you can use it elsewhere
Aws Auto Terminate Idle EmrAWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
PantherDetect threats with log data and improve cloud security posture
Dswarm Backoffice WebThe backoffice web application of d:swarm (https://github.com/dswarm/dswarm-documentation/wiki)
Tuna🐟 A streaming ETL for fish
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Monstachea go daemon that syncs MongoDB to Elasticsearch in realtime
React CsvReact components to build CSV files on the fly basing on Array/literal object of data
Go StreamsA lightweight stream processing library for Go
Baby Names AnalysisData ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Ananas DesktopA hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Koop🔮 Transform, query, and download geospatial data on the web.
BigsliceA serverless cluster computing system for the Go programming language
SmartcodeSmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!
EtlalchemyExtract, Transform, Load: Any SQL Database in 4 lines of Code.
PglogicalLogical Replication extension for PostgreSQL 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
DatacleanerThe premier open source Data Quality solution
AbcPower of appbase.io via CLI, with nifty imports from your favorite data sources
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
AistoreAIStore: scalable storage for AI applications
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Webkettle基于web版kettle开发的一套分布式综合调度,管理,ETL开发的用户专业版B/S架构工具
SmooksAn extensible Java framework for building XML and non-XML streaming applications
DagsterAn orchestration platform for the development, production, and observation of data assets.
BenthosFancy stream processing made operationally mundane
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
etl managerA python package to create a database on the platform using our moj data warehousing framework
qweryA SQL-like language for performing ETL transformations.
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
grateA Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
ETW2JSONTool and library to convert ETW logs to JSON files
dbddbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
cpp-can-isotpC++ implementation of CAN ISO 15765-2 also known as CAN ISO transport protocol. CPP CAN isotp.
beneathBeneath is a serverless real-time data platform ⚡️
AddaxAddax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
gamechanger-dataGAMECHANGER aspires to be the Department’s trusted solution for evidence-based, data-driven decision-making across the universe of DoD requirements
openrefine-dockerOpenRefine is a free, open source power tool for working with messy data and improving it. This repository contains Dockerbuild files for automated builds.
cardano-pyPython3 lib and cli for operating a Cardano Passive Node and using the API's. (PRE-ALPHA)
etlM-Lab ingestion pipeline
mlbgamedayMulti-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.
openrefine-clientThe OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.