link-moveA model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.
Stars: ✭ 32 (-74.4%)
cubetlCubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-83.2%)
EtlalchemyExtract, Transform, Load: Any SQL Database in 4 lines of Code.
Stars: ✭ 460 (+268%)
Pyetlpython ETL framework
Stars: ✭ 33 (-73.6%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+389.6%)
EtlboxA lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Stars: ✭ 203 (+62.4%)
TransformalizeConfigurable Extract, Transform, and Load
Stars: ✭ 125 (+0%)
OpenKettleWebUI一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (+10.4%)
vixtractwww.vixtract.ru
Stars: ✭ 40 (-68%)
csvpluscsvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (-46.4%)
Getting StartedThis repository is a getting started guide to Singer.
Stars: ✭ 734 (+487.2%)
StetlStetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Stars: ✭ 64 (-48.8%)
qweryA SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-77.6%)
BETL-oldBETL. Meta data driven ETL generation using T-SQL
Stars: ✭ 17 (-86.4%)
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+197.6%)
HydrographA visual ETL development and debugging tool for big data
Stars: ✭ 144 (+15.2%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (+0.8%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-80.8%)
Metlmito ETL tool
Stars: ✭ 153 (+22.4%)
BenderBender - Serverless ETL Framework
Stars: ✭ 171 (+36.8%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-68.8%)
DIRECTDIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-84%)
DataBridge.NETConfigurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-87.2%)
etlflowEtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (-69.6%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+188.8%)
Hale(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)
Stars: ✭ 84 (-32.8%)
Bentools EtlPHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.
Stars: ✭ 45 (-64%)
Dig Etl EngineDownload DIG to run on your laptop or server.
Stars: ✭ 81 (-35.2%)
Ether sqlA python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (-67.2%)
AlchemistA realtime ETL engine
Stars: ✭ 40 (-68%)
Aws Ecs AirflowRun Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (-14.4%)
KgtkKnowledge Graph Toolkit
Stars: ✭ 81 (-35.2%)
ConfigsPublic, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Stars: ✭ 37 (-70.4%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-36.8%)
Ethereum EtlPython scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+664.8%)
Yunmai Data ExtractExtract your data from the Yunmai weighing scales cloud API so you can use it elsewhere
Stars: ✭ 21 (-83.2%)
Sentinel CrawlerXenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure 💫 多语言执行器,分布式爬虫
Stars: ✭ 118 (-5.6%)
Kafka Connectequivalent to kafka-connect 🔧 for nodejs ✨🐢🚀✨
Stars: ✭ 102 (-18.4%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-36.8%)
Aws Auto Terminate Idle EmrAWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-83.2%)
PantherDetect threats with log data and improve cloud security posture
Stars: ✭ 885 (+608%)
Data StoryA visual process builder for Laravel
Stars: ✭ 71 (-43.2%)
Dswarm Backoffice WebThe backoffice web application of d:swarm (https://github.com/dswarm/dswarm-documentation/wiki)
Stars: ✭ 11 (-91.2%)
Tuna🐟 A streaming ETL for fish
Stars: ✭ 11 (-91.2%)
Csv2dbThe CSV to database command line loader
Stars: ✭ 102 (-18.4%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+856%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-84.8%)
Locopylocopy: Loading/Unloading to Redshift and Snowflake using Python.
Stars: ✭ 73 (-41.6%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+534.4%)
RikoA Python stream processing engine modeled after Yahoo! Pipes
Stars: ✭ 1,571 (+1156.8%)
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (-7.2%)
OdČeská otevřená data
Stars: ✭ 99 (-20.8%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-42.4%)
Monstachea go daemon that syncs MongoDB to Elasticsearch in realtime
Stars: ✭ 736 (+488.8%)
React CsvReact components to build CSV files on the fly basing on Array/literal object of data
Stars: ✭ 732 (+485.6%)
TransporterSync data between persistence engines, like ETL only not stodgy
Stars: ✭ 1,175 (+840%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+406.4%)
Go StreamsA lightweight stream processing library for Go
Stars: ✭ 615 (+392%)