beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (+490.91%)
etlM-Lab ingestion pipeline
Stars: ✭ 15 (+36.36%)
Webkettle基于web版kettle开发的一套分布式综合调度,管理,ETL开发的用户专业版B/S架构工具
Stars: ✭ 334 (+2936.36%)
sync-addonsOdoo Integration Addons
Stars: ✭ 69 (+527.27%)
oesophagusEnterprise Grade Single-Step Streaming Data Infrastructure Setup. (Under Development)
Stars: ✭ 12 (+9.09%)
AistoreAIStore: scalable storage for AI applications
Stars: ✭ 367 (+3236.36%)
AddaxAddax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+5490.91%)
Koop🔮 Transform, query, and download geospatial data on the web.
Stars: ✭ 505 (+4490.91%)
TEAMThe Taxonomy for ETL Automation Metadata (TEAM) is a metadata management tool for data warehouse automation. It is part of the ecosystem for data warehouse automation, alongside the Virtual Data Warehouse pattern manager and the generic schema for Data Warehouse Automation.
Stars: ✭ 27 (+145.45%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (+2372.73%)
ETW2JSONTool and library to convert ETW logs to JSON files
Stars: ✭ 66 (+500%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (+54.55%)
AbcPower of appbase.io via CLI, with nifty imports from your favorite data sources
Stars: ✭ 375 (+3309.09%)
dbddbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (+172.73%)
Baby Names AnalysisData ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Stars: ✭ 557 (+4963.64%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+3181.82%)
openrefine-dockerOpenRefine is a free, open source power tool for working with messy data and improving it. This repository contains Dockerbuild files for automated builds.
Stars: ✭ 19 (+72.73%)
Monstachea go daemon that syncs MongoDB to Elasticsearch in realtime
Stars: ✭ 736 (+6590.91%)
DagsterAn orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+37163.64%)
kozaData transformation framework for LinkML data models
Stars: ✭ 21 (+90.91%)
SmartcodeSmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!
Stars: ✭ 464 (+4118.18%)
lineageGenerate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (+45.45%)
qweryA SQL-like language for performing ETL transformations.
Stars: ✭ 28 (+154.55%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (+127.27%)
kafka-connect-datagenA Kafka Connect source connector that generates data for tests
Stars: ✭ 27 (+145.45%)
DatacleanerThe premier open source Data Quality solution
Stars: ✭ 391 (+3454.55%)
mqtt-to-kafka-bridgeMove your messages from MQTT to Apache Kafka in real-time 🚀
Stars: ✭ 21 (+90.91%)
Go StreamsA lightweight stream processing library for Go
Stars: ✭ 615 (+5490.91%)
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+3281.82%)
cpp-can-isotpC++ implementation of CAN ISO 15765-2 also known as CAN ISO transport protocol. CPP CAN isotp.
Stars: ✭ 14 (+27.27%)
Getting StartedThis repository is a getting started guide to Singer.
Stars: ✭ 734 (+6572.73%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+3281.82%)
Ananas DesktopA hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Stars: ✭ 551 (+4909.09%)
gamechanger-dataGAMECHANGER aspires to be the Department’s trusted solution for evidence-based, data-driven decision-making across the universe of DoD requirements
Stars: ✭ 17 (+54.55%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+3009.09%)
cardano-pyPython3 lib and cli for operating a Cardano Passive Node and using the API's. (PRE-ALPHA)
Stars: ✭ 17 (+54.55%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (+72.73%)
mlbgamedayMulti-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.
Stars: ✭ 37 (+236.36%)
SmooksAn extensible Java framework for building XML and non-XML streaming applications
Stars: ✭ 293 (+2563.64%)
openrefine-clientThe OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Stars: ✭ 67 (+509.09%)
BigsliceA serverless cluster computing system for the Go programming language
Stars: ✭ 469 (+4163.64%)
carryPython ETL(Extract-Transform-Load) tool / Data migration tool
Stars: ✭ 115 (+945.45%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+33581.82%)
es2postgresElasticSearch to PostgreSQL loader
Stars: ✭ 18 (+63.64%)
React CsvReact components to build CSV files on the fly basing on Array/literal object of data
Stars: ✭ 732 (+6554.55%)
astroAstro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (+618.18%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (+27.27%)
DataXServer为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer) 分布式运行(DataX on YARN) 功能
Stars: ✭ 130 (+1081.82%)
EtlalchemyExtract, Transform, Load: Any SQL Database in 4 lines of Code.
Stars: ✭ 460 (+4081.82%)
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (+81.82%)
Tuna🐟 A streaming ETL for fish
Stars: ✭ 11 (+0%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+5654.55%)
PglogicalLogical Replication extension for PostgreSQL 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Stars: ✭ 455 (+4036.36%)
grateA Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.
Stars: ✭ 98 (+790.91%)