kozaData transformation framework for LinkML data models
Stars: ✭ 21 (-4.55%)
dswarman open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)
Stars: ✭ 57 (+159.09%)
Everything-TechA collection of online resources to help you on your Tech journey.
Stars: ✭ 396 (+1700%)
mlbgamedayMulti-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.
Stars: ✭ 37 (+68.18%)
wrangleA data transformation package for deep learning with Autonomio, Keras and TensorFlow.
Stars: ✭ 15 (-31.82%)
zdh server数据采集平台zdh,etl 处理服务
Stars: ✭ 53 (+140.91%)
flockFlock: A Low-Cost Streaming Query Engine on FaaS Platforms
Stars: ✭ 232 (+954.55%)
oesophagusEnterprise Grade Single-Step Streaming Data Infrastructure Setup. (Under Development)
Stars: ✭ 12 (-45.45%)
starlakeStarlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Stars: ✭ 16 (-27.27%)
gamechanger-dataGAMECHANGER aspires to be the Department’s trusted solution for evidence-based, data-driven decision-making across the universe of DoD requirements
Stars: ✭ 17 (-22.73%)
FlowMasterETL flow framework based on Yaml configs in Python
Stars: ✭ 19 (-13.64%)
ml-in-productionThe practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Stars: ✭ 29 (+31.82%)
etlflowEtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+72.73%)
iex-stocksETL for the IEX Stocks API
Stars: ✭ 19 (-13.64%)
proc-thatproc(ess)-that - easy extendable ETL tool for Node.js. Written in TypeScript.
Stars: ✭ 25 (+13.64%)
mydataharbor🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步,主要定位是为实时交易系统服务,亦可用于大数据的数据同步(ETL领域)。
Stars: ✭ 28 (+27.27%)
datartDatart is a next generation Data Visualization Open Platform
Stars: ✭ 1,042 (+4636.36%)
lineageGenerate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-27.27%)
zdh web大数据采集,抽取平台
Stars: ✭ 292 (+1227.27%)
viewflowViewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (+400%)
openrefine-clientThe OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Stars: ✭ 67 (+204.55%)
prefect-saturnPython client for using Prefect Cloud with Saturn Cloud
Stars: ✭ 15 (-31.82%)
go-bqloaderbqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.
Stars: ✭ 16 (-27.27%)
naas⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment
Stars: ✭ 219 (+895.45%)
kuwalaKuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+2054.55%)
google-sheets-etlLive import all your Google Sheets to your data warehouse
Stars: ✭ 15 (-31.82%)
cubetlCubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-4.55%)
get smartiesDummy variable generation with fit/transform capabilities
Stars: ✭ 23 (+4.55%)
contessaEasy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-22.73%)
neon-workshopA Pachyderm deep learning tutorial for conference workshops
Stars: ✭ 19 (-13.64%)
papiloDEPRECATED: Stream data processing micro-framework
Stars: ✭ 24 (+9.09%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-22.73%)
openrefine-batchShell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that communicates with the OpenRefine API.
Stars: ✭ 76 (+245.45%)
mikThe Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
Stars: ✭ 32 (+45.45%)
TEAMThe Taxonomy for ETL Automation Metadata (TEAM) is a metadata management tool for data warehouse automation. It is part of the ecosystem for data warehouse automation, alongside the Virtual Data Warehouse pattern manager and the generic schema for Data Warehouse Automation.
Stars: ✭ 27 (+22.73%)
awesome-integrationA curated list of awesome system integration software and resources.
Stars: ✭ 117 (+431.82%)
neo4j-jdbcJDBC driver for Neo4j
Stars: ✭ 110 (+400%)
link-moveA model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.
Stars: ✭ 32 (+45.45%)
kafka-connect-datagenA Kafka Connect source connector that generates data for tests
Stars: ✭ 27 (+22.73%)
lrmrLess-Resilient MapReduce framework for Go
Stars: ✭ 32 (+45.45%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+163.64%)
cardano-pyPython3 lib and cli for operating a Cardano Passive Node and using the API's. (PRE-ALPHA)
Stars: ✭ 17 (-22.73%)
singer-runnerA CLI and library to run Singer Taps and Targets
Stars: ✭ 33 (+50%)
dtd2mysqlMySQL / MariaDB import for DTD feeds (fares, timetable and routeing)
Stars: ✭ 25 (+13.64%)
dogETLA lib to transform data from jdbc,csv,json to ecah other.
Stars: ✭ 15 (-31.82%)
qsvCSVs sliced, diced & analyzed.
Stars: ✭ 438 (+1890.91%)
sql-to-redis🔄 Simple tool for ETL. From SQL to Redis.
Stars: ✭ 18 (-18.18%)
DataX-srcDataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-4.55%)
chronicle-etl📜 A CLI toolkit for extracting and working with your digital history
Stars: ✭ 78 (+254.55%)
DQCS数据质量控制系统
Stars: ✭ 34 (+54.55%)
DIRECTDIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-9.09%)