StetlStetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Stars: ✭ 64 (-58.17%)
naas⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment
Stars: ✭ 219 (+43.14%)
HydrographA visual ETL development and debugging tool for big data
Stars: ✭ 144 (-5.88%)
csvpluscsvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (-56.21%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-84.31%)
qweryA SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-81.7%)
Bulk WriterProvides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Stars: ✭ 210 (+37.25%)
vixtractwww.vixtract.ru
Stars: ✭ 40 (-73.86%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-83.66%)
Openkettlewebui一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 125 (-18.3%)
Mara PipelinesA lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Stars: ✭ 1,841 (+1103.27%)
EtlalchemyExtract, Transform, Load: Any SQL Database in 4 lines of Code.
Stars: ✭ 460 (+200.65%)
lineageGenerate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-89.54%)
mydataharbor🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步,主要定位是为实时交易系统服务,亦可用于大数据的数据同步(ETL领域)。
Stars: ✭ 28 (-81.7%)
BenderBender - Serverless ETL Framework
Stars: ✭ 171 (+11.76%)
EtlboxA lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Stars: ✭ 203 (+32.68%)
BETL-oldBETL. Meta data driven ETL generation using T-SQL
Stars: ✭ 17 (-88.89%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+3115.03%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-17.65%)
DataBridge.NETConfigurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-89.54%)
TransformalizeConfigurable Extract, Transform, and Load
Stars: ✭ 125 (-18.3%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-88.89%)
Getting StartedThis repository is a getting started guide to Singer.
Stars: ✭ 734 (+379.74%)
Go StreamsA lightweight stream processing library for Go
Stars: ✭ 615 (+301.96%)
Pyetlpython ETL framework
Stars: ✭ 33 (-78.43%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-74.51%)
DIRECTDIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-86.93%)
cubetlCubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-86.27%)
link-moveA model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.
Stars: ✭ 32 (-79.08%)
etlM-Lab ingestion pipeline
Stars: ✭ 15 (-90.2%)
etlflowEtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (-75.16%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+300%)
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+143.14%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+135.95%)
OpenKettleWebUI一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (-9.8%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (+77.78%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-48.37%)
Hale(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)
Stars: ✭ 84 (-45.1%)
Container PipelinesLet's get the ball rolling on some Container-driven CI & CD
Stars: ✭ 123 (-19.61%)
KibaData processing & ETL framework for Ruby
Stars: ✭ 1,618 (+957.52%)
Unix StreamTurn Java 8 Streams into Unix like pipelines
Stars: ✭ 119 (-22.22%)
Bodywork CoreDeploy machine learning projects developed in Python, to Kubernetes. Accelerated MLOps 🚀
Stars: ✭ 145 (-5.23%)
Kettle Web基于spring boot通过java代码调用kette
Stars: ✭ 128 (-16.34%)
DropseqpipeA SingleCell RNASeq pre-processing snakemake workflow
Stars: ✭ 119 (-22.22%)
SteppyLightweight, Python library for fast and reproducible experimentation 🔬
Stars: ✭ 119 (-22.22%)
Reddit DetectivePlay detective on Reddit: Discover political disinformation campaigns, secret influencers and more
Stars: ✭ 129 (-15.69%)
Sentinel CrawlerXenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure 💫 多语言执行器,分布式爬虫
Stars: ✭ 118 (-22.88%)
Rangelessc++ LINQ -like library of higher-order functions for data manipulation
Stars: ✭ 148 (-3.27%)
Chain.jlA Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.
Stars: ✭ 118 (-22.88%)
Pex ContextModern WebGL state wrapper for PEX: allocate GPU resources (textures, buffers), setup state pipelines and passes, and combine them into commands.
Stars: ✭ 117 (-23.53%)
LastbackendSystem for containerized apps management. From build to scaling.
Stars: ✭ 1,536 (+903.92%)
Etl.netMass processing data with a complete ETL for .net developers
Stars: ✭ 129 (-15.69%)
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (-24.18%)
EuropaPuppet Container Registry
Stars: ✭ 114 (-25.49%)
Scrapy demoall kinds of scrapy demo
Stars: ✭ 128 (-16.34%)